Chan, Kitty S; Gross, Alden L; Pezzin, Liliana E; Brandt, Jason; Kasper, Judith D
To harmonize measures of cognitive performance using item response theory (IRT) across two international aging studies. Data for persons ≥65 years from the Health and Retirement Study (HRS, N = 9,471) and the English Longitudinal Study of Aging (ELSA, N = 5,444). Cognitive performance measures varied (HRS fielded 25, ELSA 13); 9 were in common. Measurement precision was examined for IRT scores based on (a) common items, (b) common items adjusted for differential item functioning (DIF), and (c) DIF-adjusted all items. Three common items (day of date, immediate word recall, and delayed word recall) demonstrated DIF by survey. Adding survey-specific items improved precision but mainly for HRS respondents at lower cognitive levels. IRT offers a feasible strategy for harmonizing cognitive performance measures across other surveys and for other multi-item constructs of interest in studies of aging. Practical implications depend on sample distribution and the difficulty mix of in-common and survey-specific items. © The Author(s) 2015.
Full Text Available Abstract Background Surveys of doctors are an important data collection method in health services research. Ways to improve response rates, minimise survey response bias and item non-response, within a given budget, have not previously been addressed in the same study. The aim of this paper is to compare the effects and costs of three different modes of survey administration in a national survey of doctors. Methods A stratified random sample of 4.9% (2,702/54,160 of doctors undertaking clinical practice was drawn from a national directory of all doctors in Australia. Stratification was by four doctor types: general practitioners, specialists, specialists-in-training, and hospital non-specialists, and by six rural/remote categories. A three-arm parallel trial design with equal randomisation across arms was used. Doctors were randomly allocated to: online questionnaire (902; simultaneous mixed mode (a paper questionnaire and login details sent together (900; or, sequential mixed mode (online followed by a paper questionnaire with the reminder (900. Analysis was by intention to treat, as within each primary mode, doctors could choose either paper or online. Primary outcome measures were response rate, survey response bias, item non-response, and cost. Results The online mode had a response rate 12.95%, followed by the simultaneous mixed mode with 19.7%, and the sequential mixed mode with 20.7%. After adjusting for observed differences between the groups, the online mode had a 7 percentage point lower response rate compared to the simultaneous mixed mode, and a 7.7 percentage point lower response rate compared to sequential mixed mode. The difference in response rate between the sequential and simultaneous modes was not statistically significant. Both mixed modes showed evidence of response bias, whilst the characteristics of online respondents were similar to the population. However, the online mode had a higher rate of item non-response compared
Full Text Available Abstract Background Health surveys provide important information on the burden and secular trends of risk factors and disease. Several factors including survey and item non-response can affect data quality. There are few reports on efficiency, validity and the impact of item non-response, from developing countries. This report examines factors associated with item non-response and study efficiency in a national health survey in a developing Caribbean island. Methods A national sample of participants aged 15–74 years was selected in a multi-stage sampling design accounting for 4 health regions and 14 parishes using enumeration districts as primary sampling units. Means and proportions of the variables of interest were compared between various categories. Non-response was defined as failure to provide an analyzable response. Linear and logistic regression models accounting for sample design and post-stratification weighting were used to identify independent correlates of recruitment efficiency and item non-response. Results We recruited 2012 15–74 year-olds (66.2% females at a response rate of 87.6% with significant variation between regions (80.9% to 97.6%; p Conclusion Informative health surveys are possible in developing countries. While survey response rates may be satisfactory, item non-response was high in respect of income and sexual practice. In contrast to developed countries, non-response to questions on income is higher and has different correlates. These findings can inform future surveys.
Wallace, Colin S.; Prather, Edward E.; Duncan, Douglas K.
This is the third of five papers detailing our national study of general education astronomy students' conceptual and reasoning difficulties with cosmology. In this paper, we use item response theory to analyze students' responses to three out of the four conceptual cosmology surveys we developed. The specific item response theory model we use is…
Yang, Ji Seung; Zheng, Xiaying
The purpose of this article is to introduce and review the capability and performance of the Stata item response theory (IRT) package that is available from Stata v.14, 2015. Using a simulated data set and a publicly available item response data set extracted from Programme of International Student Assessment, we review the IRT package from…
The sequential model can be used to describe the variable resulting from a sequential scoring process. In this paper two more item response models are investigated with respect to their suitability for sequential scoring: the partial credit model and the graded response model. The investigation is
Glas, Cornelis A.W.; Eggen, T.J.H.M.; Veldkamp, B.P.
Item response theory is usually applied to items with a selected-response format, such as multiple choice items, whereas generalizability theory is usually applied to constructed-response tasks assessed by raters. However, in many situations, raters may use rating scales consisting of items with a selected-response format. This chapter presents a short overview of how item response theory and generalizability theory were integrated to model such assessments. Further, the precision of the esti...
Glas, Cornelis A.W.; Eggen, T.J.H.M.; Veldkamp, B.P.
Item response theory is usually applied to items with a selected-response format, such as multiple choice items, whereas generalizability theory is usually applied to constructed-response tasks assessed by raters. However, in many situations, raters may use rating scales consisting of items with a
Haberman, Shelby J; Sinharay, Sandip; Chon, Kyong Hee
Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.
Brodey, Benjamin B; Gonzalez, Nicole L; Elkin, Kathryn Ann; Sasiela, W Jordan; Brodey, Inger S
The computerized administration of self-report psychiatric diagnostic and outcomes assessments has risen in popularity. If results are similar enough across different administration modalities, then new administration technologies can be used interchangeably and the choice of technology can be based on other factors, such as convenience in the study design. An assessment based on item response theory (IRT), such as the Patient-Reported Outcomes Measurement Information System (PROMIS) depression item bank, offers new possibilities for assessing the effect of technology choice upon results. To create equivalent halves of the PROMIS depression item bank and to use these halves to compare survey responses and user satisfaction among administration modalities-paper, mobile phone, or tablet-with a community mental health care population. The 28 PROMIS depression items were divided into 2 halves based on content and simulations with an established PROMIS response data set. A total of 129 participants were recruited from an outpatient public sector mental health clinic based in Memphis. All participants took both nonoverlapping halves of the PROMIS IRT-based depression items (Part A and Part B): once using paper and pencil, and once using either a mobile phone or tablet. An 8-cell randomization was done on technology used, order of technologies used, and order of PROMIS Parts A and B. Both Parts A and B were administered as fixed-length assessments and both were scored using published PROMIS IRT parameters and algorithms. All 129 participants received either Part A or B via paper assessment. Participants were also administered the opposite assessment, 63 using a mobile phone and 66 using a tablet. There was no significant difference in item response scores for Part A versus B. All 3 of the technologies yielded essentially identical assessment results and equivalent satisfaction levels. Our findings show that the PROMIS depression assessment can be divided into 2 equivalent
Full Text Available Abstract Background Weight control behaviors are common among young people and are associated with poor health outcomes. Yet clinicians rarely ask young people about their weight control; this may be due to uncertainty about which questions to ask, specifically around whether certain weight loss strategies are healthier or unhealthy or about what weight loss behaviors are more likely to lead to adverse outcomes. Thus, the aims of the current study are: to confirm, using item response theory analysis, that the underlying latent constructs of healthy and unhealthy weight control exist; to determine the ‘red flag’ weight loss behaviors that may discriminate unhealthy from healthy weight loss; to determine the relationships between healthy and unhealthy weight loss and mental health; and to examine how weight control may vary among demographic groups. Methods Data were collected as part of a national health and wellbeing survey of secondary school students in New Zealand (n = 9,107 in 2007. Item response theory analyses were conducted to determine the underlying constructs of weight control behaviors and the behaviors that discriminate unhealthy from healthy weight control. Results The current study confirms that there are two underlying constructs of weight loss behaviors which can be described as healthy and unhealthy weight control. Unhealthy weight control was positively correlated with depressive mood. Fasting and skipping meals for weight loss had the lowest item thresholds on the unhealthy weight control continuum, indicating that they act as ‘red flags’ and warrant further discussion in routine clinical assessments. Conclusions Routine assessments of weight control strategies by clinicians are warranted, particularly for screening for meal skipping and fasting for weight loss as these behaviors appear to ‘flag’ behaviors that are associated with poor mental wellbeing.
Gideon P. De Bruin
Full Text Available The factor analysis of items often produces spurious results in the sense that unidimensional scales appear multidimensional. This may be ascribed to failure in meeting the assumptions of linearity and normality on which factor analysis is based. Item response theory is explicitly designed for the modelling of the non-linear relations between ordinal variables and provides a strong alternative to the factor analysis of items. Items may also be combined in parcels that are more likely to satisfy the assumptions of factor analysis than do the items. The use of the Rasch rating scale model and the factor analysis of parcels is illustrated with data obtained with the Locus of Control Inventory. The results of these analyses are compared with the results obtained through the factor analysis of items. It is shown that the Rasch rating scale model and the factoring of parcels produce superior results to the factor analysis of items. Recommendations for the analysis of scales are made. Opsomming Die faktorontleding van items lewer dikwels misleidende resultate op, veral in die opsig dat eendimensionele skale as meerdimensioneel voorkom. Hierdie resultate kan dikwels daaraan toegeskryf word dat daar nie aan die aannames van lineariteit en normaliteit waarop faktorontleding berus, voldoen word nie. Itemresponsteorie, wat eksplisiet vir die modellering van die nie-liniêre verbande tussen ordinale items ontwerp is, bied ’n aantreklike alternatief vir die faktorontleding van items. Items kan ook in pakkies gegroepeer word wat meer waarskynlik aan die aannames van faktorontleding voldoen as individuele items. Die gebruik van die Rasch beoordelingskaalmodel en die faktorontleding van pakkies word aan die hand van data wat met die Lokus van Beheervraelys verkry is, gedemonstreer. Die resultate van hierdie ontledings word vergelyk met die resultate wat deur ‘n faktorontleding van die individuele items verkry is. Die resultate dui daarop dat die Rasch
Nunes, Sandra; Oliveira, Teresa; Oliveira, Amílcar
The Item Response Theory (IRT) has become one of the most popular scoring frameworks for measurement data, frequently used in computerized adaptive testing, cognitively diagnostic assessment and test equating. According to Andrade et al. (2000), IRT can be defined as a set of mathematical models (Item Response Models - IRM) constructed to represent the probability of an individual giving the right answer to an item of a particular test. The number of Item Responsible Models available to measurement analysis has increased considerably in the last fifteen years due to increasing computer power and due to a demand for accuracy and more meaningful inferences grounded in complex data. The developments in modeling with Item Response Theory were related with developments in estimation theory, most remarkably Bayesian estimation with Markov chain Monte Carlo algorithms (Patz & Junker, 1999). The popularity of Item Response Theory has also implied numerous overviews in books and journals, and many connections between IRT and other statistical estimation procedures, such as factor analysis and structural equation modeling, have been made repeatedly (Van der Lindem & Hambleton, 1997). As stated before the Item Response Theory covers a variety of measurement models, ranging from basic one-dimensional models for dichotomously and polytomously scored items and their multidimensional analogues to models that incorporate information about cognitive sub-processes which influence the overall item response process. The aim of this work is to introduce the main concepts associated with one-dimensional models of Item Response Theory, to specify the logistic models with one, two and three parameters, to discuss some properties of these models and to present the main estimation procedures.
Item response theory (IRT) is a framework for modeling and analyzing item response data. Item-level modeling gives IRT advantages over classical test theory. The fit of an item score pattern to an item response theory (IRT) models is a necessary condition that must be assessed for further use of item and models that best fit ...
Fox, Gerardus J.A.
The randomized response (RR) technique is often used to obtain answers on sensitive questions. A new method is developed to measure latent variables using the RR technique because direct questioning leads to biased results. Within the RR technique is the probability of the true response modeled by
Rijmen, Frank; Jeon, Minjeong; von Davier, Matthias; Rabe-Hesketh, Sophia
Second-order item response theory models have been used for assessments consisting of several domains, such as content areas. We extend the second-order model to a third-order model for assessments that include subdomains nested in domains. Using a graphical model framework, it is shown how the model does not suffer from the curse of…
Wang, Jing; Bao, Lei
Item response theory is a popular assessment method used in education. It rests on the assumption of a probability framework that relates students' innate ability and their performance on test questions. Item response theory transforms students' raw test scores into a scaled proficiency score, which can be used to compare results obtained with different test questions. The scaled score also addresses the issues of ceiling effects and guessing, which commonly exist in quantitative assessment. We used item response theory to analyze the force concept inventory (FCI). Our results show that item response theory can be useful for analyzing physics concept surveys such as the FCI and produces results about the individual questions and student performance that are beyond the capability of classical statistics. The theory yields detailed measurement parameters regarding the difficulty, discrimination features, and probability of correct guess for each of the FCI questions.
James A Wiley
Full Text Available We put forward a new item response model which is an extension of the binomial error model first introduced by Keats and Lord. Like the binomial error model, the basic latent variable can be interpreted as a probability of responding in a certain way to an arbitrarily specified item. For a set of dichotomous items, this model gives predictions that are similar to other single parameter IRT models (such as the Rasch model but has certain advantages in more complex cases. The first is that in specifying a flexible two-parameter Beta distribution for the latent variable, it is easy to formulate models for randomized experiments in which there is no reason to believe that either the latent variable or its distribution vary over randomly composed experimental groups. Second, the elementary response function is such that extensions to more complex cases (e.g., polychotomous responses, unfolding scales are straightforward. Third, the probability metric of the latent trait allows tractable extensions to cover a wide variety of stochastic response processes.
Full Text Available Item response theory (IRT becomes an increasingly important tool when analyzing “big data” gathered from online educational venues. However, the mechanism was originally developed in traditional exam settings, and several of its assumptions are infringed upon when deployed in the online realm. For a large-enrollment physics course for scientists and engineers, the study compares outcomes from IRT analyses of exam and homework data, and then proceeds to investigate the effects of each confounding factor introduced in the online realm. It is found that IRT yields the correct trends for learner ability and meaningful item parameters, yet overall agreement with exam data is moderate. It is also found that learner ability and item discrimination is robust over a wide range with respect to model assumptions and introduced noise. Item difficulty is also robust, but over a narrower range.
Goddard, Chase; Davis, Jeremy; Pyper, Brian
We are interested to see if Item Response Theory can help to better inform the development of reasoning ability in introductory physics. A first pass through our latest batch of data from the Heat and Temperature Conceptual Evaluation, the Lawson Classroom Test of Scientific Reasoning, and the Epistemological Beliefs About Physics Survey may help in this effort.
Warne, Russell T.; McKyer, E. J. Lisako; Smith, Matthew L.
Objective: To introduce item response theory (IRT) to health behavior researchers by contrasting it with classical test theory and providing an example of IRT in health behavior. Method: Demonstrate IRT by fitting the 2PL model to substance-use survey data from the Adolescent Health Risk Behavior questionnaire (n = 1343 adolescents). Results: An…
Item response theory (IRT) becomes an increasingly important tool when analyzing "big data" gathered from online educational venues. However, the mechanism was originally developed in traditional exam settings, and several of its assumptions are infringed upon when deployed in the online realm. For a large-enrollment physics course for…
With the development in computing technology, item response theory (IRT) develops rapidly, and has become a user friendly application in psychometrics world. Limitation in classical theory is one aspect that encourages the use of IRT. In this study, the basic concept of IRT will be discussed. In addition, it will briefly review the ability…
Uto, Masaki; Ueno, Maomi
As an assessment method based on a constructivist approach, peer assessment has become popular in recent years. However, in peer assessment, a problem remains that reliability depends on the rater characteristics. For this reason, some item response models that incorporate rater parameters have been proposed. Those models are expected to improve…
Penfield, Randall David
A polytomous item is one for which the responses are scored according to three or more categories. Given the increasing use of polytomous items in assessment practices, item response theory (IRT) models specialized for polytomous items are becoming increasingly common. The purpose of this ITEMS module is to provide an accessible overview of…
Fukuhara, Hirotaka; Kamata, Akihito
A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into…
Eutalia Aparecida Candido de Araujo
Full Text Available A preocupação com medidas de traços psicológicos é antiga, sendo que muitos estudos e propostas de métodos foram desenvolvidos no sentido de alcançar este objetivo. Entre os trabalhos propostos, destaca-se a Teoria da Resposta ao Item (TRI que, a princípio, veio completar limitações da Teoria Clássica de Medidas, empregada em larga escala até hoje na medida de traços psicológicos. O ponto principal da TRI é que ela leva em consideração o item particularmente, sem relevar os escores totais; portanto, as conclusões não dependem apenas do teste ou questionário, mas de cada item que o compõe. Este artigo propõe-se a apresentar esta Teoria que revolucionou a teoria de medidas.La preocupación con las medidas de los rasgos psicológicos es antigua y muchos estudios y propuestas de métodos fueron desarrollados para lograr este objetivo. Entre estas propuestas de trabajo se incluye la Teoría de la Respuesta al Ítem (TRI que, en principio, vino a completar las limitaciones de la Teoría Clásica de los Tests, ampliamente utilizada hasta hoy en la medida de los rasgos psicológicos. El punto principal de la TRI es que se tiene en cuenta el punto concreto, sin relevar las puntuaciones totales; por lo tanto, los resultados no sólo dependen de la prueba o cuestionario, sino que de cada ítem que lo compone. En este artículo se propone presentar la Teoría que revolucionó la teoría de medidas.The concern with measures of psychological traits is old and many studies and proposals of methods were developed to achieve this goal. Among these proposed methods highlights the Item Response Theory (IRT that, in principle, came to complete limitations of the Classical Test Theory, which is widely used until nowadays in the measurement of psychological traits. The main point of IRT is that it takes into account the item in particular, not relieving the total scores; therefore, the findings do not only depend on the test or questionnaire
Weidmer, Beverly A; Brach, Cindy; Hays, Ron D
The complexity of health information often exceeds patients' skills to understand and use it. To develop survey items assessing how well healthcare providers communicate health information. Domains and items for the Consumer Assessment of Healthcare Providers and Systems (CAHPS) Item Set for Addressing Health Literacy were identified through an environmental scan and input from stakeholders. The draft item set was translated into Spanish and pretested in both English and Spanish. The revised item set was field tested with a randomly selected sample of adult patients from 2 sites using mail and telephonic data collection. Item-scale correlations, confirmatory factor analysis, and internal consistency reliability estimates were estimated to assess how well the survey items performed and identify composite measures. Finally, we regressed the CAHPS global rating of the provider item on the CAHPS core communication composite and the new health literacy composites. A total of 601 completed surveys were obtained (52% response rate). Two composite measures were identified: (1) Communication to Improve Health Literacy (16 items); and (2) How Well Providers Communicate About Medicines (6 items). These 2 composites were significantly uniquely associated with the global rating of the provider (communication to improve health literacy: PLiteracy composite accounted for 90% of the variance of the original 16-item composite. This study provides support for reliability and validity of the CAHPS Item Set for Addressing Health Literacy. These items can serve to assess whether healthcare providers have communicated effectively with their patients and as a tool for quality improvement.
Aybek, Eren Can; Demirtasli, R. Nukhet
This article aims to provide a theoretical framework for computerized adaptive tests (CAT) and item response theory models for polytomous items. Besides that, it aims to introduce the simulation and live CAT software to the related researchers. Computerized adaptive test algorithm, assumptions of item response theory models, nominal response…
Kazman, Josh B; Scott, Jonathan M; Deuster, Patricia A
The limitations for self-reporting of dietary patterns are widely recognised as a major vulnerability of FFQ and the dietary screeners/scales derived from FFQ. Such instruments can yield inconsistent results to produce questionable interpretations. The present article discusses the value of psychometric approaches and standards in addressing these drawbacks for instruments used to estimate dietary habits and nutrient intake. We argue that a FFQ or screener that treats diet as a 'latent construct' can be optimised for both internal consistency and the value of the research results. Latent constructs, a foundation for item response theory (IRT)-based scales (e.g. Patient Reported Outcomes Measurement Information System) are typically introduced in the design stage of an instrument to elicit critical factors that cannot be observed or measured directly. We propose an iterative approach that uses such modelling to refine FFQ and similar instruments. To that end, we illustrate the benefits of psychometric modelling by using items and data from a sample of 12 370 Soldiers who completed the 2012 US Army Global Assessment Tool (GAT). We used factor analysis to build the scale incorporating five out of eleven survey items. An IRT-driven assessment of response category properties indicates likely problems in the ordering or wording of several response categories. Group comparisons, examined with differential item functioning (DIF), provided evidence of scale validity across each Army sub-population (sex, service component and officer status). Such an approach holds promise for future FFQ.
de Jong, Martijn G.; Steenkamp, Jan-Benedict E.M.; Fox, Gerardus J.A.; Baumgartner, Hans
Extreme response style (ERS) is an important threat to the validity of survey-based marketing research. In this article, the authors present a new item response theory–based model for measuring ERS. This model contributes to the ERS literature in two ways. First, the method improves on existing
Trotman-Dickenson, D. I.
Describes some of the problems in writing data response items in economics for use by A Level and General Certificate of Secondary Education (GCSE) students. Examines the experience of two series of workshops on writing items, evaluating them and assessing responses from schools. Offers suggestions for producing packages of data response items as…
Bichi, Ado Abdu; Hafiz, Hadiza; Bello, Samira Abdullahi
High-stakes testing is used for the purposes of providing results that have important consequences. Validity is the cornerstone upon which all measurement systems are built. This study applied the Item Response Theory principles to analyse Northwest University Kano Post-UTME Economics test items. The developed fifty (50) economics test items was…
Cher Wong, Cheow
Building on previous works by Lord and Ogasawara for dichotomous items, this article proposes an approach to derive the asymptotic standard errors of item response theory true score equating involving polytomous items, for equivalent and nonequivalent groups of examinees. This analytical approach could be used in place of empirical methods like…
Sahin, Alper; Anil, Duygu
This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…
Arce-Ferrer, Alvaro J.; Bulut, Okan
This study examines separate and concurrent approaches to combine the detection of item parameter drift (IPD) and the estimation of scale transformation coefficients in the context of the common item nonequivalent groups design with the three-parameter item response theory equating. The study uses real and synthetic data sets to compare the two…
Kang, Hyeon-Ah; Su, Ya-Hui; Chang, Hua-Hua
A monotone relationship between a true score (τ) and a latent trait level (θ) has been a key assumption for many psychometric applications. The monotonicity property in dichotomous response models is evident as a result of a transformation via a test characteristic curve. Monotonicity in polytomous models, in contrast, is not immediately obvious because item response functions are determined by a set of response category curves, which are conceivably non-monotonic in θ. The purpose of the present note is to demonstrate strict monotonicity in ordered polytomous item response models. Five models that are widely used in operational assessments are considered for proof: the generalized partial credit model (Muraki, 1992, Applied Psychological Measurement, 16, 159), the nominal model (Bock, 1972, Psychometrika, 37, 29), the partial credit model (Masters, 1982, Psychometrika, 47, 147), the rating scale model (Andrich, 1978, Psychometrika, 43, 561), and the graded response model (Samejima, 1972, A general model for free-response data (Psychometric Monograph no. 18). Psychometric Society, Richmond). The study asserts that the item response functions in these models strictly increase in θ and thus there exists strict monotonicity between τ and θ under certain specified conditions. This conclusion validates the practice of customarily using τ in place of θ in applied settings and provides theoretical grounds for one-to-one transformations between the two scales. © 2018 The British Psychological Society.
A coordinate system free definition of complex structure multidimensional item response theory (MIRT) for dichotomously scored items is presented. The point of view taken emphasizes the possibilities and subtleties of understanding MIRT as a multidimensional extension of the ``classical'' unidimensional item response theory models. The main theorem of the paper is that every monotonic MIRT model looks the same; they are all trivial extensions of univariate item response theory.
Using a Multivariate Multilevel Polytomous Item Response Theory Model to Study Parallel Processes of Change: The Dynamic Association between Adolescents' Social Isolation and Engagement with Delinquent Peers in the National Youth Survey
Hsieh, Chueh-An; von Eye, Alexander A.; Maier, Kimberly S.
The application of multidimensional item response theory models to repeated observations has demonstrated great promise in developmental research. It allows researchers to take into consideration both the characteristics of item response and measurement error in longitudinal trajectory analysis, which improves the reliability and validity of the…
Jin, Kuan-Yu; Wang, Wen-Chung
Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…
Our objective was to evaluate the psychometric properties of a vegetable parenting practices scale using multidimensional polytomous item response modeling which enables assessing item fit to latent variables and the distributional characteristics of the items in comparison to the respondents. We al...
Yau, David T W; Wong, May C M; Lam, K F; McGrath, Colman
Four-factor structure of the two 8-item short forms of Child Perceptions Questionnaire CPQ11-14 (RSF:8 and ISF:8) has been confirmed. However, the sum scores are typically reported in practice as a proxy of Oral health-related Quality of Life (OHRQoL), which implied a unidimensional structure. This study first assessed the unidimensionality of 8-item short forms of CPQ11-14. Item response theory (IRT) was employed to offer an alternative and complementary approach of validation and to overcome the limitations of classical test theory assumptions. A random sample of 649 12-year-old school children in Hong Kong was analyzed. Unidimensionality of the scale was tested by confirmatory factor analysis (CFA), principle component analysis (PCA) and local dependency (LD) statistic. Graded response model was fitted to the data. Contribution of each item to the scale was assessed by item information function (IIF). Reliability of the scale was assessed by test information function (TIF). Differential item functioning (DIF) across gender was identified by Wald test and expected score functions. Both CPQ11-14 RSF:8 and ISF:8 did not deviate much from the unidimensionality assumption. Results from CFA indicated acceptable fit of the one-factor model. PCA indicated that the first principle component explained >30 % of the total variation with high factor loadings for both RSF:8 and ISF:8. Almost all LD statistic items suggesting little contribution of information to the scale and item removal caused little practical impact. Comparing the TIFs, RSF:8 showed slightly better information than ISF:8. In addition to oral symptoms items, the item "Concerned with what other people think" demonstrated a uniform DIF (p Items related to oral symptoms were not informative to OHRQoL and deletion of these items is suggested. The impact of DIF across gender on the overall score was minimal. CPQ11-14 RSF:8 performed slightly better than ISF:8 in measurement precision. The 6-item short forms
Baker, Frank B
This graduate-level textbook is a tutorial for item response theory that covers both the basics of item response theory and the use of R for preparing graphical presentation in writings about the theory. Item response theory has become one of the most powerful tools used in test construction, yet one of the barriers to learning and applying it is the considerable amount of sophisticated computational effort required to illustrate even the simplest concepts. This text provides the reader access to the basic concepts of item response theory freed of the tedious underlying calculations. It is intended for those who possess limited knowledge of educational measurement and psychometrics. Rather than presenting the full scope of item response theory, this textbook is concise and practical and presents basic concepts without becoming enmeshed in underlying mathematical and computational complexities. Clearly written text and succinct R code allow anyone familiar with statistical concepts to explore and apply item re...
fed set ofvaluesof a, b, AI , B1 A2 2 . 2 A3 , and 13 , the f ’. g ’a. nd h’a in (7) are fied. Equation (7) must still hold for S - e19029e3,..* . Thus...for Item I Is -- b ?(a:1 , b1 ,O) (1 + ’)(I + e4 (22 where a and pi are arbitrary constants. These constants mst be the sam for all Items In a given...NETHERLIS I E3I1 Focility-Acquisitions 4133 Rugby Avnue 1 Lee Cronbach Bethesda, NO 20014 16 Laburnue Road Atherton, CA 94205 1 Dr. Benjamin A. Fairbank
Park, Yoon Soo; Lee, Young-Sun; Xing, Kuan
This study investigates the impact of item parameter drift (IPD) on parameter and ability estimation when the underlying measurement model fits a mixture distribution, thereby violating the item invariance property of unidimensional item response theory (IRT) models. An empirical study was conducted to demonstrate the occurrence of both IPD and an underlying mixture distribution using real-world data. Twenty-one trended anchor items from the 1999, 2003, and 2007 administrations of Trends in International Mathematics and Science Study (TIMSS) were analyzed using unidimensional and mixture IRT models. TIMSS treats trended anchor items as invariant over testing administrations and uses pre-calibrated item parameters based on unidimensional IRT. However, empirical results showed evidence of two latent subgroups with IPD. Results also showed changes in the distribution of examinee ability between latent classes over the three administrations. A simulation study was conducted to examine the impact of IPD on the estimation of ability and item parameters, when data have underlying mixture distributions. Simulations used data generated from a mixture IRT model and estimated using unidimensional IRT. Results showed that data reflecting IPD using mixture IRT model led to IPD in the unidimensional IRT model. Changes in the distribution of examinee ability also affected item parameters. Moreover, drift with respect to item discrimination and distribution of examinee ability affected estimates of examinee ability. These findings demonstrate the need to caution and evaluate IPD using a mixture IRT framework to understand its effects on item parameters and examinee ability.
Yoon Soo ePark
Full Text Available This study investigates the impact of item parameter drift (IPD on parameter and ability estimation when the underlying measurement model fits a mixture distribution, thereby violating the item invariance property of unidimensional item response theory (IRT models. An empirical study was conducted to demonstrate the occurrence of both IPD and an underlying mixture distribution using real-world data. Twenty-one trended anchor items from the 1999, 2003, and 2007 administrations of Trends in International Mathematics and Science Study (TIMSS were analyzed using unidimensional and mixture IRT models. TIMSS treats trended anchor items as invariant over testing administrations and uses pre-calibrated item parameters based on unidimensional IRT. However, empirical results showed evidence of two latent subgroups with IPD. Results showed changes in the distribution of examinee ability between latent classes over the three administrations. A simulation study was conducted to examine the impact of IPD on the estimation of ability and item parameters, when data have underlying mixture distributions. Simulations used data generated from a mixture IRT model and estimated using unidimensional IRT. Results showed that data reflecting IPD using mixture IRT model led to IPD in the unidimensional IRT model. Changes in the distribution of examinee ability also affected item parameters. Moreover, drift with respect to item discrimination and distribution of examinee ability affected estimates of examinee ability. These findings demonstrate the need to caution and evaluate IPD using a mixture IRT framework to understand its effect on item parameters and examinee ability.
This paper reviews the literature about item response models for the subject level and aggregated level (group level). Group-level item response models (IRMs) are used in the United States in large-scale assessment programs such as the National Assessment of Educational Progress and the California
Fox, J.P.; Mulder, J.; Sinharay, Sandip
Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning
Fox, Jean-Paul; Mulder, Joris; Sinharay, Sandip
Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning
Décieux Jean Philippe
Full Text Available Online surveys have become a popular method for data gathering for many reasons, including low costs and the ability to collect data rapidly. However, online data collection is often conducted without adequate attention to implementation details. One example is the frequent use of the forced answering option, which forces the respondent to answer each question in order to proceed through the questionnaire. The avoidance of missing data is often the idea behind the use of the forced answering option. However, we suggest that the costs of a reactance effect in terms of quality reduction and unit nonresponse may be high because respondents typically have plausible reasons for not answering questions. The objective of the study reported in this paper was to test the influence of forced answering on dropout rates and data quality. The results show that requiring participants answer every question increases dropout rates and decreases quality of answers. Our findings suggest that the desire for a complete data set has to be balanced against the consequences of reduced data quality.
Kleinman, Marjorie; Teresi, Jeanne A
Measures of magnitude and impact of differential item functioning (DIF) at the item and scale level, respectively are presented and reviewed in this paper. Most measures are based on item response theory models. Magnitude refers to item level effect sizes, whereas impact refers to differences between groups at the scale score level. Reviewed are magnitude measures based on group differences in the expected item scores and impact measures based on differences in the expected scale scores. The similarities among these indices are demonstrated. Various software packages are described that provide magnitude and impact measures, and new software presented that computes all of the available statistics conveniently in one program with explanations of their relationships to one another.
JOSEPH P. EIMICKE
Full Text Available The aims of this paper are to present findings related to differential item functioning (DIF in the Patient Reported Outcome Measurement Information System (PROMIS depression item bank, and to discuss potential threats to the validity of results from studies of DIF. The 32 depression items studied were modified from several widely used instruments. DIF analyses of gender, age and education were performed using a sample of 735 individuals recruited by a survey polling firm. DIF hypotheses were generated by asking content experts to indicate whether or not they expected DIF to be present, and the direction of the DIF with respect to the studied comparison groups. Primary analyses were conducted using the graded item response model (for polytomous, ordered response category data with likelihood ratio tests of DIF, accompanied by magnitude measures. Sensitivity analyses were performed using other item response models and approaches to DIF detection. Despite some caveats, the items that are recommended for exclusion or for separate calibration were "I felt like crying" and "I had trouble enjoying things that I used to enjoy." The item, "I felt I had no energy," was also flagged as evidencing DIF, and recommended for additional review. On the one hand, false DIF detection (Type 1 error was controlled to the extent possible by ensuring model fit and purification. On the other hand, power for DIF detection might have been compromised by several factors, including sparse data and small sample sizes. Nonetheless, practical and not just statistical significance should be considered. In this case the overall magnitude and impact of DIF was small for the groups studied, although impact was relatively large for some individuals.
which involve only one attribute per item. This is especially true when we are dealing with constructed-response items, we have to measure much more...Service University of Ilinois Educacional Testing Service Rosedal Road Capign. IL 61801 Princeton. K3 08541 Princeton. N3 08541 Dr. Charles LeiS Dr
Yang, Ji Seung; Hansen, Mark; Cai, Li
Traditional estimators of item response theory scale scores ignore uncertainty carried over from the item calibration process, which can lead to incorrect estimates of the standard errors of measurement (SEMs). Here, the authors review a variety of approaches that have been applied to this problem and compare them on the basis of their statistical…
Toland, Michael D.
Item response theory (IRT) is a psychometric technique used in the development, evaluation, improvement, and scoring of multi-item scales. This pedagogical article provides the necessary information needed to understand how to conduct, interpret, and report results from two commonly used ordered polytomous IRT models (Samejima's graded…
Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D.
Purpose: In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating…
Falk, Carl F.; Cai, Li
We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood-based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…
van der Linden, Willem J.
Several models for optimizing incomplete sample designs with respect to information on the item parameters are presented. The following cases are considered: (1) known ability parameters; (2) unknown ability parameters; (3) item sets with multiple ability scales; and (4) response models with
United States. Bonneville Power Administration. End-Use Research Section; Applied Management & Planning Group (Firm)
This book constitutes a portion of the primary documentation for the 1992 Pacific Northwest Residential Energy Survey, Phase I. The complete 33-volume set of primary documentation provides information needed by energy analysts and interpreters with respect to planning, execution, data collection, and data management of the PNWRES92-I process. Thirty of these volumes are devoted to different ``views`` of the data themselves, with each view having a special purpose or interest as its focus. Analyses and interpretations of these data will be the subjects of forthcoming publications. Conducted during the late summer and fall months of 1992, PNWRES92-I had the over-arching goal of satisfying basic requirements for a variety of information about the stock of residential units in Bonneville`s service region. Surveys with a similar goal were conducted in 1979 and 1983. This volume discerns the information by state. ``Selected crosstabulations`` refers to a set of nine survey items of wide interest (Dwelling Type, Ownership Type, Year-of-Construction, Dwelling Size, Primary Space-Heating Fuel, Primary Water-Heating Fuel, Household Income for 1991, Utility Type, and Space-Heating Fuels: Systems and Equipment) that were crosstabulated among themselves.
Full Text Available The role of response time in completing an item can have very different interpretations. Responding more slowly could be positively related to success as the item is answered more carefully. However, the association may be negative if working faster indicates higher ability. The objective of this study was to clarify the validity of each assumption for reasoning items considering the mode of processing. A total of 230 persons completed a computerized version of Raven’s Advanced Progressive Matrices test. Results revealed that response time overall had a negative effect. However, this effect was moderated by items and persons. For easy items and able persons the effect was strongly negative, for difficult items and less able persons it was less negative or even positive. The number of rules involved in a matrix problem proved to explain item difficulty significantly. Most importantly, a positive interaction effect between the number of rules and item response time indicated that the response time effect became less negative with an increasing number of rules. Moreover, exploratory analyses suggested that the error type influenced the response time effect.
Hateley, R. J.
Presents a pilot study on student thinking in chemistry. Verbal comments of a group of six college students were recorded and analyzed to identify how each student arrives at the correct answer in fixed response items in chemisty. (HM)
Ames, Allison J.; Penfield, Randall D.
Drawing valid inferences from item response theory (IRT) models is contingent upon a good fit of the data to the model. Violations of model-data fit have numerous consequences, limiting the usefulness and applicability of the model. This instructional module provides an overview of methods used for evaluating the fit of IRT models. Upon completing…
Cardamone, Caroline N.; Abbott, Jonathan E.; Rayyan, Saif; Seaton, Daniel T.; Pawl, Andrew; Pritchard, David E.
Item response theory is useful in both the development and evaluation of assessments and in computing standardized measures of student performance. In item response theory, individual parameters (difficulty, discrimination) for each item or question are fit by item response models. These parameters provide a means for evaluating a test and offer a better measure of student skill than a raw test score, because each skill calculation considers not only the number of questions answered correctly, but the individual properties of all questions answered. Here, we present the results from an analysis of the Mechanics Baseline Test given at MIT during 2005-2010. Using the item parameters, we identify questions on the Mechanics Baseline Test that are not effective in discriminating between MIT students of different abilities. We show that a limited subset of the highest quality questions on the Mechanics Baseline Test returns accurate measures of student skill. We compare student skills as determined by item response theory to the more traditional measurement of the raw score and show that a comparable measure of learning gain can be computed.
Carter, Nathan T; Dalal, Dev K; Guan, Li; LoPilato, Alexander C; Withrow, Scott A
Psychologists are increasingly positing theories of behavior that suggest psychological constructs are curvilinearly related to outcomes. However, results from empirical tests for such curvilinear relations have been mixed. We propose that correctly identifying the response process underlying responses to measures is important for the accuracy of these tests. Indeed, past research has indicated that item responses to many self-report measures follow an ideal point response process-wherein respondents agree only to items that reflect their own standing on the measured variable-as opposed to a dominance process, wherein stronger agreement, regardless of item content, is always indicative of higher standing on the construct. We test whether item response theory (IRT) scoring appropriate for the underlying response process to self-report measures results in more accurate tests for curvilinearity. In 2 simulation studies, we show that, regardless of the underlying response process used to generate the data, using the traditional sum-score generally results in high Type 1 error rates or low power for detecting curvilinearity, depending on the distribution of item locations. With few exceptions, appropriate power and Type 1 error rates are achieved when dominance-based and ideal point-based IRT scoring are correctly used to score dominance and ideal point response data, respectively. We conclude that (a) researchers should be theory-guided when hypothesizing and testing for curvilinear relations; (b) correctly identifying whether responses follow an ideal point versus dominance process, particularly when items are not extreme is critical; and (c) IRT model-based scoring is crucial for accurate tests of curvilinearity. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Ip, Edward Hak-Sing; Chen, Shyh-Huei
The problem of fitting unidimensional item-response models to potentially multidimensional data has been extensively studied. The focus of this article is on response data that contains a major dimension of interest but that may also contain minor nuisance dimensions. Because fitting a unidimensional model to multidimensional data results in…
Rodriguez, Hector P; Crane, Paul K
Objective To evaluate psychometric properties of a widely used patient experience survey. Data Sources English-language responses to the Clinician & Group Consumer Assessment of Healthcare Providers and Systems (CG-CAHPS®) survey (n = 12,244) from a 2008 quality improvement initiative involving eight southern California medical groups. Methods We used an iterative hybrid ordinal logistic regression/item response theory differential item functioning (DIF) algorithm to identify items with DIF related to patient sociodemographic characteristics, duration of the physician–patient relationship, number of physician visits, and self-rated physical and mental health. We accounted for all sources of DIF and determined its cumulative impact. Principal Findings The upper end of the CG-CAHPS® performance range is measured with low precision. With sensitive settings, some items were found to have DIF. However, overall DIF impact was negligible, as 0.14 percent of participants had salient DIF impact. Latinos who spoke predominantly English at home had the highest prevalence of salient DIF impact at 0.26 percent. Conclusions The CG-CAHPS® functions similarly across commercially insured respondents from diverse backgrounds. Consequently, previously documented racial and ethnic group differences likely reflect true differences rather than measurement bias. The impact of low precision at the upper end of the scale should be clarified. PMID:22092021
Fries, James F; Witter, James; Rose, Matthias; Cella, David; Khanna, Dinesh; Morgan-DeWitt, Esi
Patient-reported outcome (PRO) questionnaires record health information directly from research participants because observers may not accurately represent the patient perspective. Patient-reported Outcomes Measurement Information System (PROMIS) is a US National Institutes of Health cooperative group charged with bringing PRO to a new level of precision and standardization across diseases by item development and use of item response theory (IRT). With IRT methods, improved items are calibrated on an underlying concept to form an item bank for a "domain" such as physical function (PF). The most informative items can be combined to construct efficient "instruments" such as 10-item or 20-item PF static forms. Each item is calibrated on the basis of the probability that a given person will respond at a given level, and the ability of the item to discriminate people from one another. Tailored forms may cover any desired level of the domain being measured. Computerized adaptive testing (CAT) selects the best items to sharpen the estimate of a person's functional ability, based on prior responses to earlier questions. PROMIS item banks have been improved with experience from several thousand items, and are calibrated on over 21,000 respondents. In areas tested to date, PROMIS PF instruments are superior or equal to Health Assessment Questionnaire and Medical Outcome Study Short Form-36 Survey legacy instruments in clarity, translatability, patient importance, reliability, and sensitivity to change. Precise measures, such as PROMIS, efficiently incorporate patient self-report of health into research, potentially reducing research cost by lowering sample size requirements. The advent of routine IRT applications has the potential to transform PRO measurement.
Deon P. de Bruin
Research purpose: The main focus of this study was to use the Rasch model to provide insight into the dimensionality of the UWES-17, and to assess whether work engagement should be interpreted as one single overall score, three separate scores, or a combination. Motivation for the study: It is unclear whether a summative score is more representative of work engagement or whether scores are more meaningful when interpreted for each dimension separately. Previous work relied on confirmatory factor analysis; the potential of item response models has not been tapped. Research design: A quantitative cross-sectional survey design approach was used. Participants, 2429 employees of a South African Information and Communication Technology (ICT company, completed the UWES-17. Main findings: Findings indicate that work engagement should be treated as a unidimensional construct: individual scores should be interpreted in a summative manner, giving a single global score. Practical/managerial implications: Users of the UWES-17 may interpret a single, summative score for work engagement. Findings of this study should also contribute towards standardising UWES-17 scores, allowing meaningful comparisons to be made. Contribution/value-add: The findings will benefit researchers, organisational consultants and managers. Clarity on dimensionality and interpretation of work engagement will assist researchers in future studies. Managers and consultants will be able to make better-informed decisions when using work engagement data.
Full Text Available Penelitian ini bertujuan untuk menghasilkan sebuah alat ukur (tes berpikir kritis yang valid dan reliabel untuk digunakan, baik dalam lingkup pendidikan maupun kerja di Indonesia. Tahapan penelitian dilakukan berdasarkan tahap pengembangan tes menurut Hambleton dan Jones (1993. Kisi-kisi dan pembuatan butir didasarkan pada konsep dalam tes Watson-Glaser Critical Thinking Appraisal (WGCTA. Pada WGCTA, berpikir kritis terdiri dari lima dimensi yaitu Inference, Recognition Assumption, Deduction, Interpretation dan Evaluation of arguments. Uji coba tes dilakukan pada 1.453 peserta tes seleksi karyawan di Surabaya, Gresik, Tuban, Bojonegoro, Rembang. Data dikotomi dianalisis dengan menggunakan model IRT dengan dua parameter yaitu daya beda dan tingkat kesulitan butir. Analisis dilakukan dengan menggunakan program statistik Mplus versi 6.11 Sebelum melakukan analisis dengan IRT, dilakukan pengujian asumsi yaitu uji unidimensionalitas, independensi lokal dan Item Characteristic Curve (ICC. Hasil analisis terhadap 68 butir menghasilkan 15 butir dengan daya beda yang cukup baik dan tingkat kesulitan butir yang berkisar antara –4 sampai dengan 2.448. Sedikitnya jumlah butir yang berkualitas baik disebabkan oleh kelemahan dalam menentukan subject matter experts di bidang berpikir kritis dan pemilihan metode skoring. Kata kunci: Pengembangan tes, berpikir kritis, item response theory DEVELOPING CRITICAL THINKING TEST UTILISING ITEM RESPONSE THEORY Abstract The present study was aimed to develop a valid and reliable instrument in assesing critical thinking which can be implemented both in educational and work settings in Indonesia. Following the Hambleton and Jones’s (1993 procedures on test development, the study developed the instrument by employing the concept of critical thinking from Watson-Glaser Critical Thinking Appraisal (WGCTA. The study included five dimensions of critical thinking as adopted from the WGCTA: Inference, Recognition
Liu, Chen-Wei; Wang, Wen-Chung
Examinee-selected item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set, always yields incomplete data (i.e., when only the selected items are answered, data are missing for the others) that are likely non-ignorable in likelihood inference. Standard item response theory (IRT) models become infeasible when ESI data are missing not at random (MNAR). To solve this problem, the authors propose a two-dimensional IRT model that posits one unidimensional IRT model for observed data and another for nominal selection patterns. The two latent variables are assumed to follow a bivariate normal distribution. In this study, the mirt freeware package was adopted to estimate parameters. The authors conduct an experiment to demonstrate that ESI data are often non-ignorable and to determine how to apply the new model to the data collected. Two follow-up simulation studies are conducted to assess the parameter recovery of the new model and the consequences for parameter estimation of ignoring MNAR data. The results of the two simulation studies indicate good parameter recovery of the new model and poor parameter recovery when non-ignorable missing data were mistakenly treated as ignorable. © 2017 The British Psychological Society.
Full Text Available Abstract Background Previous studies have analyzed the psychometric properties of the World Health Organization Disability Assessment Schedule II (WHO-DAS II using classical omnibus measures of scale quality. These analyses are sample dependent and do not model item responses as a function of the underlying trait level. The main objective of this study was to examine the effectiveness of the WHO-DAS II items and their options in discriminating between changes in the underlying disability level by means of item response analyses. We also explored differential item functioning (DIF in men and women. Methods The participants were 3615 adult general practice patients from 17 regions of Spain, with a first diagnosed major depressive episode. The 12-item WHO-DAS II was administered by the general practitioners during the consultation. We used a non-parametric item response method (Kernel-Smoothing implemented with the TestGraf software to examine the effectiveness of each item (item characteristic curves and their options (option characteristic curves in discriminating between changes in the underliying disability level. We examined composite DIF to know whether women had a higher probability than men of endorsing each item. Results Item response analyses indicated that the twelve items forming the WHO-DAS II perform very well. All items were determined to provide good discrimination across varying standardized levels of the trait. The items also had option characteristic curves that showed good discrimination, given that each increasing option became more likely than the previous as a function of increasing trait level. No gender-related DIF was found on any of the items. Conclusions All WHO-DAS II items were very good at assessing overall disability. Our results supported the appropriateness of the weights assigned to response option categories and showed an absence of gender differences in item functioning.
David Thissen, a professor in the Department of Psychology and Neuroscience, Quantitative Program at the University of North Carolina, has consulted and served on technical advisory committees for assessment programs that use item response theory (IRT) over the past couple decades. He has come to the conclusion that there are usually two purposes…
Ames, Allison J.; Samonte, Kelli
Interest in using Bayesian methods for estimating item response theory models has grown at a remarkable rate in recent years. This attentiveness to Bayesian estimation has also inspired a growth in available software such as WinBUGS, R packages, BMIRT, MPLUS, and SAS PROC MCMC. This article intends to provide an accessible overview of Bayesian…
Huang, Hung-Yu; Wang, Wen-Chung
In the social sciences, latent traits often have a hierarchical structure, and data can be sampled from multiple levels. Both hierarchical latent traits and multilevel data can occur simultaneously. In this study, we developed a general class of item response theory models to accommodate both hierarchical latent traits and multilevel data. The…
The article provides an overview of goodness-of-fit assessment methods for item response theory (IRT) models. It is now possible to obtain accurate "p"-values of the overall fit of the model if bivariate information statistics are used. Several alternative approaches are described. As the validity of inferences drawn on the fitted model…
Andersson, Björn; Xin, Tao
In applications of item response theory (IRT), an estimate of the reliability of the ability estimates or sum scores is often reported. However, analytical expressions for the standard errors of the estimators of the reliability coefficients are not available in the literature and therefore the variability associated with the estimated reliability…
Bowman, Nicholas A.; Herzog, Serge; Sharkness, Jessica
Item Response Theory (IRT) is a measurement theory that is ideal for scale and test development in institutional research, but it is not without its drawbacks. This chapter provides an overview of IRT, describes an example of its use, and highlights the pros and cons of using IRT in applied settings.
Ip, Edward; Molenberghs, Geert; Chen, Shyh-Huei
The problem of fitting unidimensional item response models to potentially multidimensional data has been extensively studied. The focus of this article is on response data that have a strong dimension but also contain minor nuisance dimensions. Fitting a unidimensional model to such multidimensio......The problem of fitting unidimensional item response models to potentially multidimensional data has been extensively studied. The focus of this article is on response data that have a strong dimension but also contain minor nuisance dimensions. Fitting a unidimensional model...... to such multidimensional data is believed to result in ability estimates that represent a combination of the major and minor dimensions. We conjecture that the underlying dimension for the fitted unidimensional model, which we call the functional dimension, represents a nonlinear projection. In this article we investigate...... tool. An example regarding a construct of desire for physical competency is used to illustrate the functional unidimensional approach....
Molenaar, Dylan; de Boeck, Paul
In item response theory modeling of responses and response times, it is commonly assumed that the item responses have the same characteristics across the response times. However, heterogeneity might arise in the data if subjects resort to different response processes when solving the test items. These differences may be within-subject effects, that is, a subject might use a certain process on some of the items and a different process with different item characteristics on the other items. If the probability of using one process over the other process depends on the subject's response time, within-subject heterogeneity of the item characteristics across the response times arises. In this paper, the method of response mixture modeling is presented to account for such heterogeneity. Contrary to traditional mixture modeling where the full response vectors are classified, response mixture modeling involves classification of the individual elements in the response vector. In a simulation study, the response mixture model is shown to be viable in terms of parameter recovery. In addition, the response mixture model is applied to a real dataset to illustrate its use in investigating within-subject heterogeneity in the item characteristics across response times.
Lalor, John P; Wu, Hao; Yu, Hong
Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.
Fox, Jean-Paul; Mulder, Joris; Sinharay, Sandip
Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning the underlying covariance structure are evaluated using (fractional) Bayes factor tests. The support for a unidimensional factor (i.e., assumption of local independence) and differential item functioning are evaluated by testing the covariance components. The posterior distribution of common covariance components is obtained in closed form by transforming latent responses with an orthogonal (Helmert) matrix. This posterior distribution is defined as a shifted-inverse-gamma, thereby introducing a default prior and a balanced prior distribution. Based on that, an MCMC algorithm is described to estimate all model parameters and to compute (fractional) Bayes factor tests. Simulation studies are used to show that the (fractional) Bayes factor tests have good properties for testing the underlying covariance structure of binary response data. The method is illustrated with two real data studies.
Chalmers, R Philip; Pek, Jolynn; Liu, Yang
Confidence intervals (CIs) are fundamental inferential devices which quantify the sampling variability of parameter estimates. In item response theory, CIs have been primarily obtained from large-sample Wald-type approaches based on standard error estimates, derived from the observed or expected information matrix, after parameters have been estimated via maximum likelihood. An alternative approach to constructing CIs is to quantify sampling variability directly from the likelihood function with a technique known as profile-likelihood confidence intervals (PL CIs). In this article, we introduce PL CIs for item response theory models, compare PL CIs to classical large-sample Wald-type CIs, and demonstrate important distinctions among these CIs. CIs are then constructed for parameters directly estimated in the specified model and for transformed parameters which are often obtained post-estimation. Monte Carlo simulation results suggest that PL CIs perform consistently better than Wald-type CIs for both non-transformed and transformed parameters.
Matthew S. Johnson
Full Text Available Item response theory (IRT models are a class of statistical models used by researchers to describe the response behaviors of individuals to a set of categorically scored items. The most common IRT models can be classified as generalized linear fixed- and/or mixed-effect models. Although IRT models appear most often in the psychological testing literature, researchers in other fields have successfully utilized IRT-like models in a wide variety of applications. This paper discusses the three major methods of estimation in IRT and develops R functions utilizing the built-in capabilities of the R environment to find the marginal maximum likelihood estimates of the generalized partial credit model. The currently available R packages ltm is also discussed.
Composite assessments aim to combine different aspects of a disease in a single score and are utilized in a variety of therapeutic areas. The data arising from these evaluations are inherently discrete with distinct statistical properties. This tutorial presents the framework of the item response theory (IRT) for the analysis of this data type in a pharmacometric context. The article considers both conceptual (terms and assumptions) and practical questions (modeling software, data requirements, and model building). PMID:29493119
Liu, Yang; Maydeu-Olivares, Alberto
When an item response theory model fails to fit adequately, the items for which the model provides a good fit and those for which it does not must be determined. To this end, we compare the performance of several fit statistics for item pairs with known asymptotic distributions under maximum likelihood estimation of the item parameters: (a) a mean and variance adjustment to bivariate Pearson's X(2), (b) a bivariate subtable analog to Reiser's (1996) overall goodness-of-fit test, (c) a z statistic for the bivariate residual cross product, and (d) Maydeu-Olivares and Joe's (2006) M2 statistic applied to bivariate subtables. The unadjusted Pearson's X(2) with heuristically determined degrees of freedom is also included in the comparison. For binary and ordinal data, our simulation results suggest that the z statistic has the best Type I error and power behavior among all the statistics under investigation when the observed information matrix is used in its computation. However, if one has to use the cross-product information, the mean and variance adjusted X(2) is recommended. We illustrate the use of pairwise fit statistics in 2 real-data examples and discuss possible extensions of the current research in various directions.
Mielenz, Thelma J; Callahan, Leigh F; Edwards, Michael C
Examine the feasibility of performing an item response theory (IRT) analysis on two of the Centers for Disease Control and Prevention health-related quality of life (CDC HRQOL) modules - the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM). Previous principal components analyses confirm that the two scales both assess a mix of mental (CDC-MH) and physical health (CDC-PH). The purpose is to conduct item response theory (IRT) analysis on the CDC-MH and CDC-PH scales separately. 2182 patients with self-reported or physician-diagnosed arthritis completed a cross-sectional survey including HDCM and HDSM items. Besides global health, the other 8 items ask the number of days that some statement was true; we chose to recode the data into 8 categories based on observed clustering. The IRT assumptions were assessed using confirmatory factor analysis and the data could be modeled using an unidimensional IRT model. The graded response model was used for IRT analyses and CDC-MH and CDC-PH scales were analyzed separately in flexMIRT. The IRT parameter estimates for the five-item CDC-PH all appeared reasonable. The three-item CDC-MH did not have reasonable parameter estimates. The CDC-PH scale is amenable to IRT analysis but the existing The CDC-MH scale is not. We suggest either using the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM) as they currently stand or the CDC-PH scale alone if the primary goal is to measure physical health related HRQOL.
A new and very interesting approach to the analysis of responses and response times is proposed by Goldhammer (this issue). In his approach, differences in the speed-ability compromise within respondents are considered to confound the differences in ability between respondents. These confounding effects of speed on the inferences about ability can…
Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.
In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…
Tijmstra, Jesper; Bolsinova, Maria; Jeon, Minjeong
This article proposes a general mixture item response theory (IRT) framework that allows for classes of persons to differ with respect to the type of processes underlying the item responses. Through the use of mixture models, nonnested IRT models with different structures can be estimated for different classes, and class membership can be estimated for each person in the sample. If researchers are able to provide competing measurement models, this mixture IRT framework may help them deal with some violations of measurement invariance. To illustrate this approach, we consider a two-class mixture model, where a person's responses to Likert-scale items containing a neutral middle category are either modeled using a generalized partial credit model, or through an IRTree model. In the first model, the middle category ("neither agree nor disagree") is taken to be qualitatively similar to the other categories, and is taken to provide information about the person's endorsement. In the second model, the middle category is taken to be qualitatively different and to reflect a nonresponse choice, which is modeled using an additional latent variable that captures a person's willingness to respond. The mixture model is studied using simulation studies and is applied to an empirical example.
Bilir, Mustafa Kuzey
This study uses a new psychometric model (mixture item response theory-MIMIC model) that simultaneously estimates differential item functioning (DIF) across manifest groups and latent classes. Current DIF detection methods investigate DIF from only one side, either across manifest groups (e.g., gender, ethnicity, etc.), or across latent classes…
Scott, Terry F.; Schumayer, Daniel
In this paper we present a series of item response models of data collected using the Force Concept Inventory. The Force Concept Inventory (FCI) was designed to poll the Newtonian conception of force viewed as a multidimensional concept, that is, as a complex of distinguishable conceptual dimensions. Several previous studies have developed single-trait item response models of FCI data; however, we feel that multidimensional models are also appropriate given the explicitly multidimensional design of the inventory. The models employed in the research reported here vary in both the number of fitting parameters and the number of underlying latent traits assumed. We calculate several model information statistics to ensure adequate model fit and to determine which of the models provides the optimal balance of information and parsimony. Our analysis indicates that all item response models tested, from the single-trait Rasch model through to a model with ten latent traits, satisfy the standard requirements of fit. However, analysis of model information criteria indicates that the five-trait model is optimal. We note that an earlier factor analysis of the same FCI data also led to a five-factor model. Furthermore the factors in our previous study and the traits identified in the current work match each other well. The optimal five-trait model assigns proficiency scores to all respondents for each of the five traits. We construct a correlation matrix between the proficiencies in each of these traits. This correlation matrix shows strong correlations between some proficiencies, and strong anticorrelations between others. We present an interpretation of this correlation matrix.
Pilkonis, Paul A; Kim, Yookyung; Yu, Lan; Morse, Jennifer Q
The Adult Attachment Ratings (AAR) include 3 scales for anxious, ambivalent attachment (excessive dependency, interpersonal ambivalence, and compulsive care-giving), 3 for avoidant attachment (rigid self-control, defensive separation, and emotional detachment), and 1 for secure attachment. The scales include items (ranging from 6-16 in their original form) scored by raters using a 3-point format (0 = absent, 1 = present, and 2 = strongly present) and summed to produce a total score. Item response theory (IRT) analyses were conducted with data from 414 participants recruited from psychiatric outpatient, medical, and community settings to identify the most informative items from each scale. The IRT results allowed us to shorten the scales to 5-item versions that are more precise and easier to rate because of their brevity. In general, the effective range of measurement for the scales was 0 to +2 SDs for each of the attachment constructs; that is, from average to high levels of attachment problems. Evidence for convergent and discriminant validity of the scales was investigated by comparing them with the Experiences of Close Relationships-Revised (ECR-R) scale and the Kobak Attachment Q-sort. The best consensus among self-reports on the ECR-R, informant ratings on the ECR-R, and expert judgments on the Q-sort and the AAR emerged for anxious, ambivalent attachment. Given the good psychometric characteristics of the scale for secure attachment, however, this measure alone might provide a simple alternative to more elaborate procedures for some measurement purposes. Conversion tables are provided for the 7 scales to facilitate transformation from raw scores to IRT-calibrated (theta) scores.
Molenaar, Dylan; Oberski, Daniel; Vermunt, Jeroen; De Boeck, Paul
Current approaches to model responses and response times to psychometric tests solely focus on between-subject differences in speed and ability. Within subjects, speed and ability are assumed to be constants. Violations of this assumption are generally absorbed in the residual of the model. As a result, within-subject departures from the between-subject speed and ability level remain undetected. These departures may be of interest to the researcher as they reflect differences in the response processes adopted on the items of a test. In this article, we propose a dynamic approach for responses and response times based on hidden Markov modeling to account for within-subject differences in responses and response times. A simulation study is conducted to demonstrate acceptable parameter recovery and acceptable performance of various fit indices in distinguishing between different models. In addition, both a confirmatory and an exploratory application are presented to demonstrate the practical value of the modeling approach.
Polak, Marike; De Rooij, Mark; Heiser, Willem J.
In this article we propose a model-free diagnostic for single-peakedness (unimodality) of item responses. Presuming a unidimensional unfolding scale and a given item ordering, we approximate item response functions of all items based on ordered conditional means (OCM). The proposed OCM methodology is based on Thurstone & Chave's (1929) "criterion…
Fan, Zhewen; Wang, Chun; Chang, Hua-Hua; Douglas, Jeffrey
Traditional methods for item selection in computerized adaptive testing only focus on item information without taking into consideration the time required to answer an item. As a result, some examinees may receive a set of items that take a very long time to finish, and information is not accrued as efficiently as possible. The authors propose two…
Tian, Guo-Liang; Tang, Man-Lai; Wu, Qin; Liu, Yin
Although the item count technique is useful in surveys with sensitive questions, privacy of those respondents who possess the sensitive characteristic of interest may not be well protected due to a defect in its original design. In this article, we propose two new survey designs (namely the Poisson item count technique and negative binomial item count technique) which replace several independent Bernoulli random variables required by the original item count technique with a single Poisson or negative binomial random variable, respectively. The proposed models not only provide closed form variance estimate and confidence interval within [0, 1] for the sensitive proportion, but also simplify the survey design of the original item count technique. Most importantly, the new designs do not leak respondents' privacy. Empirical results show that the proposed techniques perform satisfactorily in the sense that it yields accurate parameter estimate and confidence interval.
Baghaei, Purya; Ravand, Hamdollah
In this study the magnitudes of local dependence generated by cloze test items and reading comprehension items were compared and their impact on parameter estimates and test precision was investigated. An advanced English as a foreign language reading comprehension test containing three reading passages and a cloze test was analyzed with a…
In attitude measurement and sensory tests, the unfolding model is typically used. In this model, response probability is formulated by the distance between the person and the stimulus. In this study, we proposed an unfolding item response model using best-worst scaling (BWU model), in which a person chooses the best and worst stimulus among repeatedly presented subsets of stimuli. We also formulated an unfolding model using best scaling (BU model), and compared the accuracy of estimates between the BU and BWU models. A simulation experiment showed that the BWU modell performed much better than the BU model in terms of bias and root mean square errors of estimates. With reference to Usami (2011), the proposed models were apllied to actual data to measure attitudes toward tardiness. Results indicated high similarity between stimuli estimates generated with the proposed models and those of Usami (2011).
van Rijn, Peter W; Ali, Usama S
We compare three modelling frameworks for accuracy and speed of item responses in the context of adaptive testing. The first framework is based on modelling scores that result from a scoring rule that incorporates both accuracy and speed. The second framework is the hierarchical modelling approach developed by van der Linden (2007, Psychometrika, 72, 287) in which a regular item response model is specified for accuracy and a log-normal model for speed. The third framework is the diffusion framework in which the response is assumed to be the result of a Wiener process. Although the three frameworks differ in the relation between accuracy and speed, one commonality is that the marginal model for accuracy can be simplified to the two-parameter logistic model. We discuss both conditional and marginal estimation of model parameters. Models from all three frameworks were fitted to data from a mathematics and spelling test. Furthermore, we applied a linear and adaptive testing mode to the data off-line in order to determine differences between modelling frameworks. It was found that a model from the scoring rule framework outperformed a hierarchical model in terms of model-based reliability, but the results were mixed with respect to correlations with external measures. © 2017 The British Psychological Society.
Full Text Available Background Several recent studies have shown that total scores on depressive symptom measures in a general population approximate an exponential pattern except for the lower end of the distribution. Furthermore, we confirmed that the exponential pattern is present for the individual item responses on the Center for Epidemiologic Studies Depression Scale (CES-D. To confirm the reproducibility of such findings, we investigated the total score distribution and item responses of the Kessler Screening Scale for Psychological Distress (K6 in a nationally representative study. Methods Data were drawn from the National Survey of Midlife Development in the United States (MIDUS, which comprises four subsamples: (1 a national random digit dialing (RDD sample, (2 oversamples from five metropolitan areas, (3 siblings of individuals from the RDD sample, and (4 a national RDD sample of twin pairs. K6 items are scored using a 5-point scale: “none of the time,” “a little of the time,” “some of the time,” “most of the time,” and “all of the time.” The pattern of total score distribution and item responses were analyzed using graphical analysis and exponential regression model. Results The total score distributions of the four subsamples exhibited an exponential pattern with similar rate parameters. The item responses of the K6 approximated a linear pattern from “a little of the time” to “all of the time” on log-normal scales, while “none of the time” response was not related to this exponential pattern. Discussion The total score distribution and item responses of the K6 showed exponential patterns, consistent with other depressive symptom scales.
Partington, Susan N; Menzies, Tim J; Colburn, Trina A; Saelens, Brian E; Glanz, Karen
The community food environment may contribute to obesity by influencing food choice. Store and restaurant audits are increasingly common methods for assessing food environments, but are time consuming and costly. A valid, reliable brief measurement tool is needed. The purpose of this study was to develop and validate reduced-item food environment audit tools for stores and restaurants. Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed in 820 stores and 1,795 restaurants in West Virginia, San Diego, and Seattle. Data mining techniques (correlation-based feature selection and linear regression) were used to identify survey items highly correlated to total survey scores and produce reduced-item audit tools that were subsequently validated against full NEMS surveys. Regression coefficients were used as weights that were applied to reduced-item tool items to generate comparable scores to full NEMS surveys. Data were collected and analyzed in 2008-2013. The reduced-item tools included eight items for grocery, ten for convenience, seven for variety, and five for other stores; and 16 items for sit-down, 14 for fast casual, 19 for fast food, and 13 for specialty restaurants-10% of the full NEMS-S and 25% of the full NEMS-R. There were no significant differences in median scores for varying types of retail food outlets when compared to the full survey scores. Median in-store audit time was reduced 25%-50%. Reduced-item audit tools can reduce the burden and complexity of large-scale or repeated assessments of the retail food environment without compromising measurement quality. Copyright © 2015 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.
Amber D. Dumford
Full Text Available As society’s needs for quantitative skills become more prevalent, college graduates require quantitative skills regardless of their career choices. Therefore, it is important that institutions assess students’ engagement in quantitative activities during college. This study chronicles the process taken by the National Survey of Student Engagement (NSSE to develop items that measure students’ participation in quantitative reasoning (QR activities. On the whole, findings across the quantitative and qualitative analyses suggest good overall properties for the developed QR items. The items show great promise to explore and evaluate the frequency with which college students participate in QR-related activities. Each year, hundreds of institutions across the United States and Canada participate in NSSE, and, with the addition of these new items on the core survey, every participating institution will have information on this topic. Our hope is that these items will spur conversations on campuses about students’ use of quantitative reasoning activities.
Cross-national comparisons of responses to survey items are often affected by response style, particularly extreme response style (ERS). ERS varies across cultures, and has the potential to bias inferences in cross-national comparisons. For example, in both PISA and TIMSS assessments, it has been documented that when examined within countries,…
Meijer, R.R.; Sotaridona, Leonardo
We propose a new method for detecting item preknowledge in a CAT based on an estimate of “effective response time” for each item. Effective response time is defined as the time required for an individual examinee to answer an item correctly. An unusually short response time relative to the expected
Wang, Wen-Chung; Chen, Hui-Fang; Jin, Kuan-Yu
Many scales contain both positively and negatively worded items. Reverse recoding of negatively worded items might not be enough for them to function as positively worded items do. In this study, we commented on the drawbacks of existing approaches to wording effect in mixed-format scales and used bi-factor item response theory (IRT) models to…
Sturm, Alexandra; Kuhfeld, Megan; Kasari, Connie; McCracken, James T
Research and practice in autism spectrum disorder (ASD) rely on quantitative measures, such as the Social Responsiveness Scale (SRS), for characterization and diagnosis. Like many ASD diagnostic measures, SRS scores are influenced by factors unrelated to ASD core features. This study further interrogates the psychometric properties of the SRS using item response theory (IRT), and demonstrates a strategy to create a psychometrically sound short form by applying IRT results. Social Responsiveness Scale analyses were conducted on a large sample (N = 21,426) of youth from four ASD databases. Items were subjected to item factor analyses and evaluation of item bias by gender, age, expressive language level, behavior problems, and nonverbal IQ. Item selection based on item psychometric properties, DIF analyses, and substantive validity produced a reduced item SRS short form that was unidimensional in structure, highly reliable (α = .96), and free of gender, age, expressive language, behavior problems, and nonverbal IQ influence. The short form also showed strong relationships with established measures of autism symptom severity (ADOS, ADI-R, Vineland). Degree of association between all measures varied as a function of expressive language. Results identified specific SRS items that are more vulnerable to non-ASD-related traits. The resultant 16-item SRS short form may possess superior psychometric properties compared to the original scale and emerge as a more precise measure of ASD core symptom severity, facilitating research and practice. Future research using IRT is needed to further refine existing measures of autism symptomatology. © 2017 Association for Child and Adolescent Mental Health.
Full Text Available In joint models for item response times (RTs and response accuracy (RA, local item dependence is composed of local RA dependence and local RT dependence. The two components are usually caused by the same common stimulus and emerge as pairs. Thus, the violation of local item independence in the joint models is called paired local item dependence. To address the issue of paired local item dependence while applying the joint cognitive diagnosis models (CDMs, this study proposed a joint testlet cognitive diagnosis modeling approach. The proposed approach is an extension of Zhan et al. (2017 and it incorporates two types of random testlet effect parameters (one for RA and the other for RTs to account for paired local item dependence. The model parameters were estimated using the full Bayesian Markov chain Monte Carlo (MCMC method. The 2015 PISA computer-based mathematics data were analyzed to demonstrate the application of the proposed model. Further, a brief simulation study was conducted to demonstrate the acceptable parameter recovery and the consequence of ignoring paired local item dependence.
Hospers, J. Mirjam Boeschen; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B.; Kramer, Sophia E.
Purpose: We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Method: Cross-sectional data from 2,352 adults with and without hearing…
Zhao, Yue; Kuan, Hoi Kei; Chung, Joyce O K; Chan, Cecilia K Y; Li, William H C
The investigation of learning approaches in the clinical workplace context has remained an under-researched area. Despite the validation of learning approach instruments and their applications in various clinical contexts, little is known about the extent to which an individual item, that reflects a specific learning strategy and motive, effectively contributes to characterizing students' learning approaches. This study aimed to measure nursing students' approaches to learning in a clinical practicum using the Approaches to Learning at Work Questionnaire (ALWQ). Survey research design was used in the study. A sample of year 3 nursing students (n = 208) who undertook a 6-week clinical practicum course participated in the study. Factor analyses were conducted, followed by an item response theory analysis, including model assumption evaluation (unidimensionality and local independence), item calibration and goodness-of-fit assessment. Two subscales, deep and surface, were derived. Findings suggested that: (a) items measuring the deep motive from intrinsic interest and deep strategies of relating new ideas to similar situations, and that of concept mapping served as the strongest discriminating indicators; (b) the surface strategy of memorizing facts and details without an overall picture exhibited the highest discriminating power among all surface items; and, (c) both subscales appeared to be informative in assessing a broad range of the corresponding latent trait. The 21-item ALWQ derived from this study presented an efficient, internally consistent and precise measure. Findings provided a useful psychometric evaluation of the ALWQ in the clinical practicum context, added evidence to the utility of the ALWQ for nursing education practice and research, and echoed the discussions from previous studies on the role of the contextual factors in influencing student choices of different learning strategies. They provided insights for clinical educators to measure
Pearson, Jennifer L; Hitchman, Sara C; Brose, Leonie S; Bauld, Linda; Glasser, Allison M; Villanti, Andrea C; McNeill, Ann; Abrams, David B; Cohen, Joanna E
A consistent approach using standardised items to assess e-cigarette use in both youth and adult populations will aid cross-survey and cross-national comparisons of the effect of e-cigarette (and tobacco) policies and improve our understanding of the population health impact of e-cigarette use. Focusing on adult behaviour, we propose a set of e-cigarette use items, discuss their utility and potential adaptation, and highlight e-cigarette constructs that researchers should avoid without further item development. Reliable and valid items will strengthen the emerging science and inform knowledge synthesis for policy-making. Building on informal discussions at a series of international meetings of 65 experts from 15 countries, the authors provide recommendations for assessing e-cigarette use behaviour, relative perceived harm, device type, presence of nicotine, flavours and reasons for use. We recommend items assessing eight core constructs: e-cigarette ever use, frequency of use and former daily use; relative perceived harm; device type; primary flavour preference; presence of nicotine; and primary reason for use. These items should be standardised or minimally adapted for the policy context and target population. Researchers should be prepared to update items as e-cigarette device characteristics change. A minimum set of e-cigarette items is proposed to encourage consensus around items to allow for cross-survey and cross-jurisdictional comparisons of e-cigarette use behaviour. These proposed items are a starting point. We recognise room for continued improvement, and welcome input from e-cigarette users and scientific colleagues. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Mallinckrodt, Brent; Tekie, Yacob T
The Working Alliance Inventory (WAI) has made great contributions to psychotherapy research. However, studies suggest the 7-point response format and 3-factor structure of the client version may have psychometric problems. This study used Rasch item response theory (IRT) to (a) improve WAI response format, (b) compare two brief 12-item versions (WAI-sr; WAI-s), and (c) develop a new 16-item Brief Alliance Inventory (BAI). Archival data from 1786 counseling center and community clients were analyzed. IRT findings suggested problems with crossed category thresholds. A rescoring scheme that combines neighboring responses to create 5- and 4-point scales sharply reduced these problems. Although subscale variance was reduced by 11-26%, rescoring yielded improved reliability and generally higher correlations with therapy process (session depth and smoothness) and outcome measures (residual gain symptom improvement). The 16-item BAI was designed to maximize "bandwidth" of item difficulty and preserve a broader range of WAI sensitivity than WAI-s or WAI-sr. Comparisons suggest the BAI performed better in several respects than the WAI-s or WAI-sr and equivalent to the full WAI on several performance indicators.
The study was to identify the load, the type and the significance of differential item functioning (DIF) in constructed response item using the partial credit model (PCM). The data in the study were the students’ instruments and the students’ responses toward the PISA-like test items that had been completed by 386 ninth grade students and 460 tenth grade students who had been about 15 years old in the Province of Yogyakarta Special Region in Indonesia. The analysis toward the item characteris...
Depaoli, Sarah; Tiemensma, Jitske; Felt, John M
The multidimensional graded response model, an item response theory (IRT) model, can be used to improve the assessment of surveys, even when sample sizes are restricted. Typically, health-based survey development utilizes classical statistical techniques (e.g. reliability and factor analysis). In a review of four prominent journals within the field of Health Psychology, we found that IRT-based models were used in less than 10% of the studies examining scale development or assessment. However, implementing IRT-based methods can provide more details about individual survey items, which is useful when determining the final item content of surveys. An example using a quality of life survey for Cushing's syndrome (CushingQoL) highlights the main components for implementing the multidimensional graded response model. Patients with Cushing's syndrome (n = 397) completed the CushingQoL. Results from the multidimensional graded response model supported a 2-subscale scoring process for the survey. All items were deemed as worthy contributors to the survey. The graded response model can accommodate unidimensional or multidimensional scales, be used with relatively lower sample sizes, and is implemented in free software (example code provided in online Appendix). Use of this model can help to improve the quality of health-based scales being developed within the Health Sciences.
Breivik, Kyrre; Olweus, Dan
In the present article, we used IRT (graded response) modeling as a useful technology for a detailed and refined study of the psychometric properties of the various items of the Olweus Bullying scale and the scale itself. The sample consisted of a very large number of Norwegian 4th-10th grade students (n = 48 926). The IRT analyses revealed that the scale was essentially unidimensional and had excellent reliability in the upper ranges of the latent bullying tendency trait, as intended and desired. Gender DIF effects were identified with regard to girls' use of indirect bullying by social exclusion and boys' use of physical bullying by hitting and kicking but these effects were small and worked in opposite directions, having negligible effects at the scale level. Also scale scores adjusted for DIF effects differed very little from non-adjusted scores. In conclusion, the empirical data were well characterized by the chosen IRT model and the Olweus Bullying scale was considered well suited for the conduct of fair and reliable comparisons involving different gender-age groups. Information Aggr. Behav. 9999:XX-XX, 2014. © 2014 Wiley Periodicals, Inc. © 2014 Wiley Periodicals, Inc.
Schlingman, Wayne M.; Prather, E. E.; Collaboration of Astronomy Teaching Scholars CATS
Analyzing the data from the recent national study using the Light and Spectroscopy Concept Inventory (LSCI), this project uses Item Response Theory (IRT) to investigate the learning gains of students as measured by the LSCI. IRT provides a theoretical model to generate parameters accounting for students’ abilities. We use IRT to measure changes in students’ abilities to reason about light from pre- to post-instruction. Changes in students’ abilities are compared by classroom to better understand the learning that is taking place in classrooms across the country. We compare the average change in ability for each classroom to the Interactivity Assessment Score (IAS) to provide further insight into the prior results presented from this data set. This material is based upon work supported by the National Science Foundation under Grant No. 0715517, a CCLI Phase III Grant for the Collaboration of Astronomy Teaching Scholars (CATS). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Huang, Yueng-Hsiang; Lee, Jin; Chen, Zhuo; Perry, MacKenna; Cheung, Janelle H; Wang, Mo
Zohar and Luria's (2005) safety climate (SC) scale, measuring organization- and group- level SC each with 16 items, is widely used in research and practice. To improve the utility of the SC scale, we shortened the original full-length SC scales. Item response theory (IRT) analysis was conducted using a sample of 29,179 frontline workers from various industries. Based on graded response models, we shortened the original scales in two ways: (1) selecting items with above-average discriminating ability (i.e. offering more than 6.25% of the original total scale information), resulting in 8-item organization-level and 11-item group-level SC scales; and (2) selecting the most informative items that together retain at least 30% of original scale information, resulting in 4-item organization-level and 4-item group-level SC scales. All four shortened scales had acceptable reliability (≥0.89) and high correlations (≥0.95) with the original scale scores. The shortened scales will be valuable for academic research and practical survey implementation in improving occupational safety. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Steca, Patrizia; Monzani, Dario; Greco, Andrea; Chiesi, Francesca; Primi, Caterina
This study is aimed at testing the measurement properties of the Life Orientation Test-Revised (LOT-R) for the assessment of dispositional optimism by employing item response theory (IRT) analyses. The LOT-R was administered to a large sample of 2,862 Italian adults. First, confirmatory factor analyses demonstrated the theoretical conceptualization of the construct measured by the LOT-R as a single bipolar dimension. Subsequently, IRT analyses for polytomous, ordered response category data were applied to investigate the items' properties. The equivalence of the items across gender and age was assessed by analyzing differential item functioning. Discrimination and severity parameters indicated that all items were able to distinguish people with different levels of optimism and adequately covered the spectrum of the latent trait. Additionally, the LOT-R appears to be gender invariant and, with minor exceptions, age invariant. Results provided evidence that the LOT-R is a reliable and valid measure of dispositional optimism. © The Author(s) 2014.
Sass, D. A.; Schmitt, T. A.; Walker, C. M.
Item response theory (IRT) procedures have been used extensively to study normal latent trait distributions and have been shown to perform well; however, less is known concerning the performance of IRT with non-normal latent trait distributions. This study investigated the degree of latent trait estimation error under normal and non-normal…
Mueller, Anne E; Segal, Daniel L; Gavett, Brandon; Marty, Meghan A; Yochim, Brian; June, Andrea; Coolidge, Frederick L
The Geriatric Anxiety Scale (GAS; Segal et al. (Segal, D. L., June, A., Payne, M., Coolidge, F. L. and Yochim, B. (2010). Journal of Anxiety Disorders, 24, 709-714. doi:10.1016/j.janxdis.2010.05.002) is a self-report measure of anxiety that was designed to address unique issues associated with anxiety assessment in older adults. This study is the first to use item response theory (IRT) to examine the psychometric properties of a measure of anxiety in older adults. A large sample of older adults (n = 581; mean age = 72.32 years, SD = 7.64 years, range = 60 to 96 years; 64% women; 88% European American) completed the GAS. IRT properties were examined. The presence of differential item functioning (DIF) or measurement bias by age and sex was assessed, and a ten-item short form of the GAS (called the GAS-10) was created. All GAS items had discrimination parameters of 1.07 or greater. Items from the somatic subscale tended to have lower discrimination parameters than items on the cognitive or affective subscales. Two items were flagged for DIF, but the impact of the DIF was negligible. Women scored significantly higher than men on the GAS and its subscales. Participants in the young-old group (60 to 79 years old) scored significantly higher on the cognitive subscale than participants in the old-old group (80 years old and older). Results from the IRT analyses indicated that the GAS and GAS-10 have strong psychometric properties among older adults. We conclude by discussing implications and future research directions.
Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan
This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming…
Full Text Available Based on nonlinear models between the measured latent variable and the item response, item response theory (IRT enables independent estimation of item and person parameters and local estimation of measurement error. These properties of IRT are also the main theoretical advantages of IRT over classical test theory (CTT. Empirical evidence, however, often failed to discover consistent differences between IRT and CTT parameters and between invariance measures of CTT and IRT parameter estimates. In this empirical study a real data set from the Third International Mathematics and Science Study (TIMSS 1995 was used to address the following questions: (1 How comparable are CTT and IRT based item and person parameters? (2 How invariant are CTT and IRT based item parameters across different participant groups? (3 How invariant are CTT and IRT based item and person parameters across different item sets? The findings indicate that the CTT and the IRT item/person parameters are very comparable, that the CTT and the IRT item parameters show similar invariance property when estimated across different groups of participants, that the IRT person parameters are more invariant across different item sets, and that the CTT item parameters are at least as much invariant in different item sets as the IRT item parameters. The results furthermore demonstrate that, with regards to the invariance property, IRT item/person parameters are in general empirically superior to CTT parameters, but only if the appropriate IRT model is used for modelling the data.
Bergner, Yoav; Droschler, Stefan; Kortemeyer, Gerd; Rayyan, Saif; Seaton, Daniel; Pritchard, David E.
We apply collaborative filtering (CF) to dichotomously scored student response data (right, wrong, or no interaction), finding optimal parameters for each student and item based on cross-validated prediction accuracy. The approach is naturally suited to comparing different models, both unidimensional and multidimensional in ability, including a…
Kaskowitz, Gary S.; De Ayala, R. J.
Studied the effect of item parameter estimation for computation of linking coefficients for the test response function (TRF) linking/equating method. Simulation results showed that linking was more accurate when there was less error in the parameter estimates, and that 15 or 25 common items provided better results than 5 common items under both…
Fleming, Danielle; Wilson, Mark; Ahlgrim-Delzell, Lynn
The Nonverbal Literacy Assessment (NVLA) is a literacy assessment designed for students with significant intellectual disabilities. The 218-item test was initially examined using confirmatory factor analysis. This method showed that the test worked as expected, but the items loaded onto a single factor. This article uses item response theory to…
Rodrigues, Malvina Thaís Pacheco; Moreira, Thereza Maria Magalhaes; Vasconcelos, Alexandre Meira de; Andrade, Dalton Francisco de; Silva, Daniele Braz da; Barbetta, Pedro Alberto
To analyze, by means of "Item Response Theory", an instrument to measure adherence to t treatment for hypertension. Analytical study with 406 hypertensive patients with associated complications seen in primary care in Fortaleza, CE, Northeastern Brazil, 2011 using "Item Response Theory". The stages were: dimensionality test, calibrating the items, processing data and creating a scale, analyzed using the gradual response model. A study of the dimensionality of the instrument was conducted by analyzing the polychoric correlation matrix and factor analysis of complete information. Multilog software was used to calibrate items and estimate the scores. Items relating to drug therapy are the most directly related to adherence while those relating to drug-free therapy need to be reworked because they have less psychometric information and low discrimination. The independence of items, the small number of levels in the scale and low explained variance in the adjustment of the models show the main weaknesses of the instrument analyzed. The "Item Response Theory" proved to be a relevant analysis technique because it evaluated respondents for adherence to treatment for hypertension, the level of difficulty of the items and their ability to discriminate between individuals with different levels of adherence, which generates a greater amount of information. The instrument analyzed is limited in measuring adherence to hypertension treatment, by analyzing the "Item Response Theory" of the item, and needs adjustment. The proper formulation of the items is important in order to accurately measure the desired latent trait.
Andrich, David; Humphry, Stephen M.; Marais, Ida
Models of modern test theory imply statistical independence among responses, generally referred to as "local independence." One violation of local independence occurs when the response to one item governs the response to a subsequent item. Expanding on a formulation of this kind of violation as a process in the dichotomous Rasch model,…
... 5 Administrative Personnel 1 2010-01-01 2010-01-01 false How does OPM select survey items? 591.212 Section 591.212 Administrative Personnel OFFICE OF PERSONNEL MANAGEMENT CIVIL SERVICE REGULATIONS ALLOWANCES AND DIFFERENTIALS Cost-of-Living Allowance and Post Differential-Nonforeign Areas Cost-Of-Living...
May, Alexis; Klonsky, E. David
The Youth Risk Behavior Survey (YRBS) is used by the United States Centers for Disease Control to estimate rates of suicidal thoughts and behaviors in adolescents. This study investigated the validity of the YRBS suicidality items by examining their relationship to criterion variables including loneliness, anxiety, depression, substance use, and…
Hong, Ickpyo; Lee, Mi Jung; Kim, Moon Young; Park, Hae Yean
The aim of this study is to investigate the psychometrics of the 12 items of an instrument assessing activities of daily living (ADL) using an item response theory model. A total of 648 adults with physical disabilities and having difficulties in ADLs were retrieved from the 2014 Korean National Survey on People with Disabilities. The psychometric testing included factor analysis, internal consistency, precision, and differential item functioning (DIF) across categories including sex, older age, marital status, and physical impairment area. The sample had a mean age of 69.7 years old (SD = 13.7). The majority of the sample had lower extremity impairments (62.0%) and had at least 2.1 chronic conditions. The instrument demonstrated unidimensional construct and good internal consistency (Cronbach's alpha = 0.95). The instrument precisely estimated person measures within a wide range of theta values (-2.22 logits 5.0%). Our findings indicate that the dressing item would need to be modified to improve its psychometrics. Overall, the ADL instrument demonstrates good psychometrics, and thus, it may be used as a standardized instrument for measuring disability in rehabilitation contexts. However, the findings are limited to adults with physical disabilities. Future studies should replicate psychometric testing for survey respondents with other disorders and for children.
Sekely, Angela; Taylor, Graeme J; Bagby, R Michael
The Toronto Structured Interview for Alexithymia (TSIA) was developed to provide a structured interview method for assessing alexithymia. One drawback of this instrument is the amount of time it takes to administer and score. The current study used item response theory (IRT) methods to analyze data from a large heterogeneous multi-language sample (N = 842) to investigate whether a subset of items could be selected to create a short version of the instrument. Samejima's (1969) graded response model was used to fit the item responses. Items providing maximum information were retained in the short model, resulting in the elimination of 12-items from the original 24-items. Despite the 50% reduction in the number of items, 65.22% of the information was retained. Further studies are needed to validate the short version. A short version of the TSIA is potentially of practical value to clinicians and researchers with time constraints. Copyright © 2018. Published by Elsevier B.V.
Tsubakita, Takashi; Kawazoe, Nobuo; Kasano, Eri
Health literacy predicts health outcomes. Despite concerns surrounding the health of Japanese young adults, to date there has been no objective assessment of health literacy in this population. This study aimed to develop a Functional Health Literacy Scale for Young Adults (funHLS-YA) based on item response theory. Each item in the scale requires participants to choose the most relevant term from 3 choices in relation to a target item, thus assessing objective rather than perceived health literacy. The 20-item scale was administered to 1816 university students and 1751 responded. Cronbach's α coefficient was .73. Difficulty and discrimination parameters of each item were estimated, resulting in the exclusion of 1 item. Some items showed different difficulty parameters for male and female participants, reflecting that some aspects of health literacy may differ by gender. The current 19-item version of funHLS-YA can reliably assess the objective health literacy of Japanese young adults.
Pearson, Jennifer L; Hitchman, Sara C; Brose, Leonie S; Bauld, Linda; Glasser, Allison M; Villanti, Andrea C; McNeill, Ann; Abrams, David B; Cohen, Joanna E
Background: A consistent approach using standardized items to assess e-cigarette use in both youth and adult populations will aid cross-survey and cross-national comparisons of the effect of e-cigarette (and tobacco) policies and improve our understanding of the population health impact of e-cigarette use. Focusing on adult behavior, we propose a set of e-cigarette use items, discuss their utility and potential adaptation, and highlight e-cigarette constructs that researchers should avoid wit...
Lake, Christopher J.; Withrow, Scott; Zickar, Michael J.; Wood, Nicole L.; Dalal, Dev K.; Bochinski, Joseph
Adapting the original latitude of acceptance concept to Likert-type surveys, response latitudes are defined as the range of graded response options a person is willing to endorse. Response latitudes were expected to relate to attitude involvement such that high involvement was linked to narrow latitudes (the result of selective, careful…
Recent research highlights the multiple steps to preparing and injecting drugs and the resultant viral threats faced by drug users. This research suggests that more sensitive measurement of injection drug HIV risk behavior is required. In addition, growing evidence suggests there are gender differences in injection risk behavior. However, the potential for differential item functioning between genders has not been explored. To explore item response theory as an improved measurement modeling technique that provides empirically justified scaling of injection risk behavior and to examine for potential gender-based differential item functioning. Data is used from three studies in the National Institute on Drug Abuse's Criminal Justice Drug Abuse Treatment Studies. A two-parameter item response theory model was used to scale injection risk behavior and logistic regression was used to examine for differential item functioning. Item fit statistics suggest that item response theory can be used to scale injection risk behavior and these models can provide more sensitive estimates of risk behavior. Additionally, gender-based differential item functioning is present in the current data. Improved measurement of injection risk behavior using item response theory should be encouraged as these models provide increased congruence between construct measurement and the complexity of injection-related HIV risk. Suggestions are made to further improve injection risk behavior measurement. Furthermore, results suggest direct comparisons of composite scores between males and females may be misleading and future work should account for differential item functioning before comparing levels of injection risk behavior.
Paek, Insu; Han, Kyung T.
This article reviews a new item response theory (IRT) model estimation program, IRTPRO 2.1, for Windows that is capable of unidimensional and multidimensional IRT model estimation for existing and user-specified constrained IRT models for dichotomously and polytomously scored item response data. (Contains 1 figure and 2 notes.)
This paper reviews the literature about item response models for the subject level and aggregated level (group level). Group-level item response models (IRMs) are used in the United States in large-scale assessment programs such as the National Assessment of Educational Progress and the California Assessment Program. In the Netherlands, these…
Fletcher, Richard B.; Crocker, Peter
The present study investigated the social physique anxiety scale's factor structure and item properties using confirmatory factor analysis and item response theory. An additional aim was to identify differences in response patterns between groups (gender). A large sample of high school students aged 11-15 years (N = 1,529) consisting of n =…
van der Linden, Willem J.
Dichotomous item response theory (IRT) models can be viewed as families of stochastically ordered distributions of responses to test items. This paper explores several properties of such distributiom. The focus is on the conditions under which stochastic order in families of conditional
Holman, Rebecca; Glas, Cornelis A.W.
A model-based procedure for assessing the extent to which missing data can be ignored and handling non-ignorable missing data is presented. The procedure is based on item response theory modelling. As an example, the approach is worked out in detail in conjunction with item response data modelled
Holman, Rebecca; Glas, Cees A. W.
A model-based procedure for assessing the extent to which missing data can be ignored and handling non-ignorable missing data is presented. The procedure is based on item response theory modelling. As an example, the approach is worked out in detail in conjunction with item response data modelled
Jeon, Minjeong; De Boeck, Paul; van der Linden, Wim
We present a novel application of a generalized item response tree model to investigate test takers' answer change behavior. The model allows us to simultaneously model the observed patterns of the initial and final responses after an answer change as a function of a set of latent traits and item parameters. The proposed application is illustrated…
Thibodeau, Michel A; Leonard, Rachel C; Abramowitz, Jonathan S; Riemann, Bradley C
The Dimensional Obsessive-Compulsive Scale (DOCS) is a promising measure of obsessive-compulsive disorder (OCD) symptoms but has received minimal psychometric attention. We evaluated the utility and reliability of DOCS scores. The study included 832 students and 300 patients with OCD. Confirmatory factor analysis supported the originally proposed four-factor structure. DOCS total and subscale scores exhibited good to excellent internal consistency in both samples (α = .82 to α = .96). Patient DOCS total scores reduced substantially during treatment (t = 16.01, d = 1.02). DOCS total scores discriminated between students and patients (sensitivity = 0.76, 1 - specificity = 0.23). The measure did not exhibit gender-based differential item functioning as tested by Mantel-Haenszel chi-square tests. Expected response options for each item were plotted as a function of item response theory and demonstrated that DOCS scores incrementally discriminate OCD symptoms ranging from low to extremely high severity. Incremental differences in DOCS scores appear to represent unbiased and reliable differences in true OCD symptom severity. © The Author(s) 2014.
Bjorner, Jakob B; Rose, Matthias; Gandek, Barbara
assistant (PDA), or personal computer (PC) on the Internet, and a second form by PC, in the same administration. Structural invariance, equivalence of item responses, and measurement precision were evaluated using confirmatory factor analysis and item response theory methods. RESULTS: Multigroup...... levels in IVR, PQ, or PDA administration as compared to PC. Availability of large item response theory-calibrated PROMIS item banks allowed for innovations in study design and analysis.......PURPOSE: To test the impact of method of administration (MOA) on the measurement characteristics of items developed in the Patient-Reported Outcomes Measurement Information System (PROMIS). METHODS: Two non-overlapping parallel 8-item forms from each of three PROMIS domains (physical function...
Hoffmann-Eßer, Wiebke; Siering, Ulrich; Neugebauer, Edmund A M; Brockhaus, Anne Catharina; McGauran, Natalie; Eikermann, Michaela
The AGREE II instrument is the most commonly used guideline appraisal tool. It includes 23 appraisal criteria (items) organized within six domains. AGREE II also includes two overall assessments (overall guideline quality, recommendation for use). Our aim was to investigate how strongly the 23 AGREE II items influence the two overall assessments. An online survey of authors of publications on guideline appraisals with AGREE II and guideline users from a German scientific network was conducted between 10th February 2015 and 30th March 2015. Participants were asked to rate the influence of the AGREE II items on a Likert scale (0 = no influence to 5 = very strong influence). The frequencies of responses and their dispersion were presented descriptively. Fifty-eight of the 376 persons contacted (15.4%) participated in the survey and the data of the 51 respondents with prior knowledge of AGREE II were analysed. Items 7-12 of Domain 3 (rigour of development) and both items of Domain 6 (editorial independence) had the strongest influence on the two overall assessments. In addition, Items 15-17 (clarity of presentation) had a strong influence on the recommendation for use. Great variations were shown for the other items. The main limitation of the survey is the low response rate. In guideline appraisals using AGREE II, items representing rigour of guideline development and editorial independence seem to have the strongest influence on the two overall assessments. In order to ensure a transparent approach to reaching the overall assessments, we suggest the inclusion of a recommendation in the AGREE II user manual on how to consider item and domain scores. For instance, the manual could include an a-priori weighting of those items and domains that should have the strongest influence on the two overall assessments. The relevance of these assessments within AGREE II could thereby be further specified.
Jan Ketil Arnulf
Full Text Available Some disciplines in the social sciences rely heavily on collecting survey responses to detect empirical relationships among variables. We explored whether these relationships were a priori predictable from the semantic properties of the survey items, using language processing algorithms which are now available as new research methods. Language processing algorithms were used to calculate the semantic similarity among all items in state-of-the-art surveys from Organisational Behaviour research. These surveys covered areas such as transformational leadership, work motivation and work outcomes. This information was used to explain and predict the response patterns from real subjects. Semantic algorithms explained 60-86% of the variance in the response patterns and allowed remarkably precise prediction of survey responses from humans, except in a personality test. Even the relationships between independent and their purported dependent variables were accurately predicted. This raises concern about the empirical nature of data collected through some surveys if results are already given a priori through the way subjects are being asked. Survey response patterns seem heavily determined by semantics. Language algorithms may suggest these prior to administering a survey. This study suggests that semantic algorithms are becoming new tools for the social sciences, opening perspectives on survey responses that prevalent psychometric theory cannot explain.
Full Text Available The study was to identify the load, the type and the significance of differential item functioning (DIF in constructed response item using the partial credit model (PCM. The data in the study were the students’ instruments and the students’ responses toward the PISA-like test items that had been completed by 386 ninth grade students and 460 tenth grade students who had been about 15 years old in the Province of Yogyakarta Special Region in Indonesia. The analysis toward the item characteristics through the student categorization based on their class was conducted toward the PCM using CONQUEST software. Furthermore, by applying these items characteristics, the researcher draw the category response function (CRF graphic in order to identify whether the type of DIF content had been in uniform or non-uniform. The significance of DIF was identified by comparing the discrepancy between the difficulty level parameter and the error in the CONQUEST output results. The results of the analysis showed that from 18 items that had been analyzed there were 4 items which had not been identified load DIF, there were 5 items that had been identified containing DIF but not statistically significant and there were 9 items that had been identified containing DIF significantly. The causes of items containing DIF were discussed.
Shono, Yusuke; Grenard, Jerry L.; Ames, Susan L.; Stacy, Alan W.
A substance-related word association test (WAT) is one of the commonly used indirect tests of substance-related implicit associative memory and has been shown to predict substance use. This study applied an item response theory (IRT) modeling approach to evaluate psychometric properties of the alcohol- and marijuana-related WATs and their items among 775 ethnically diverse at-risk adolescents. After examining the IRT assumptions, item fit, and differential item functioning (DIF) across gender and age groups, the original 18 WAT items were reduced to 14- and 15-items in the alcohol- and marijuana-related WAT, respectively. Thereafter, unidimensional one- and two-parameter logistic models (1PL and 2PL models) were fitted to the revised WAT items. The results demonstrated that both alcohol- and marijuana-related WATs have good psychometric properties. These results were discussed in light of the framework of a unified concept of construct validity (Messick, 1975, 1989, 1995). PMID:25134051
Harrison, Peter M C; Collins, Tom; Müllensiefen, Daniel
Modern psychometric theory provides many useful tools for ability testing, such as item response theory, computerised adaptive testing, and automatic item generation. However, these techniques have yet to be integrated into mainstream psychological practice. This is unfortunate, because modern psychometric techniques can bring many benefits, including sophisticated reliability measures, improved construct validity, avoidance of exposure effects, and improved efficiency. In the present research we therefore use these techniques to develop a new test of a well-studied psychological capacity: melodic discrimination, the ability to detect differences between melodies. We calibrate and validate this test in a series of studies. Studies 1 and 2 respectively calibrate and validate an initial test version, while Studies 3 and 4 calibrate and validate an updated test version incorporating additional easy items. The results support the new test's viability, with evidence for strong reliability and construct validity. We discuss how these modern psychometric techniques may also be profitably applied to other areas of music psychology and psychological science in general.
Full Text Available This study investigated the multiple-choice test of understanding of vectors (TUV, by applying item response theory (IRT. The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test’s distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.
Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan
This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.
Full Text Available In the psychometric literature, item response theory models have been proposed that explicitly take the decision process underlying the responses of subjects to psychometric test items into account. Application of these models is however hampered by the absence of general and flexible software to fit these models. In this paper, we present diffIRT, an R package that can be used to fit item response theory models that are based on a diffusion process. We discuss parameter estimation and model fit assessment, show the viability of the package in a simulation study, and illustrate the use of the package with two datasets pertaining to extraversion and mental rotation. In addition, we illustrate how the package can be used to fit the traditional diffusion model (as it has been originally developed in experimental psychology to data.
Spencer, Mercedes; Cho, Sun-Joo; Cutting, Laurie E
In the current study, we examined the dimensionality of the 16-item Card Sorting subtest of the Delis-Kaplan Executive Functioning System assessment in a sample of 264 native English-speaking children between the ages of 9 and 15 years. We also tested for measurement invariance for these items across age and gender groups using item response theory (IRT). Results of the exploratory factor analysis indicated that a two-factor model that distinguished between verbal and perceptual items provided the best fit to the data. Although the items demonstrated measurement invariance across age groups, measurement invariance was violated for gender groups, with two items demonstrating differential item functioning for males and females. Multigroup analysis using all 16 items indicated that the items were more effective for individuals whose IRT scale scores were relatively high. A single-group explanatory IRT model using 14 non-differential item functioning items showed that for perceptual ability, females scored higher than males and that scores increased with age for both males and females; for verbal ability, the observed increase in scores across age differed for males and females. The implications of these findings are discussed.
João Carlos Hipólito Bernardes do Nascimento
Full Text Available This study aimed to measure the level of financial literacy of Business Administration course students at a federal Higher Education Institution (HEI. To this end, a survey was conducted on 307 students. The Item Response Theory (IRT was employed for data analysis and the findings support the conclusion that the students show a low level of financial literacy, as well as the existence of a conservative investment profile among students. This scenario, in line with previous empirical studies conducted in the Brazil, is worrying given the potential negative externalities resulting from poor financial decisions, especially those related to home financing and retirement preparations. This study contributes to the empirical evaluation, within the national context, of the use of IRT in estimating financial literacy, and shows that it is, indeed, an important methodological option in the estimation of this latent trait. Furthermore, this enables financial knowledge to be compared through consistent and reliable means, using studies, populations, realities and separate programs.
Utilização de medicamentos por aposentados brasileiros: 2 - Taxa de resposta e preenchimento de questionário postal em Belo Horizonte, Minas Gerais, Brasil Use of medication by Brazilian retirees: 2 - Response rate and item completeness in a postal survey in Belo Horizonte, Minas Gerais State, Brazil
Andréia Queiroz Ribeiro
Full Text Available São descritos a taxa de resposta e o preenchimento de questionários auto-administrados num inquérito postal sobre o perfil de utilização de medicamentos por aposentados e pensionistas do INSS, de 60 anos ou mais de idade no Município de Belo Horizonte, Minas Gerais, Brasil, em 2003. Os questionários foram enviados duas vezes para os endereços de 800 indivíduos sorteados por amostragem aleatória simples, com base no banco de dados do INSS. A taxa de resposta ao inquérito postal foi de 47,8% e não houve diferença significativa tanto entre participantes e não participantes quanto entre respondentes iniciais e tardios em relação às características selecionadas. Para a maioria das variáveis sócio-demográficas e de saúde, os percentuais de omissão de respostas não ultrapassaram 5%, tanto no total da amostra, quanto em cada um dos subgrupos de respondentes. As informações mais omitidas ocorreram para as variáveis relativas ao uso de medicamentos, com destaque para a não-utilização de medicamentos que deveriam ser usados, a dose e laboratório fabricante do medicamento. Nossos resultados indicam que o detalhamento de aspectos relacionados ao uso de medicamentos deve ser reconsiderado em questionários de autopreenchimento.This paper reports on the response rate and completeness of item response in a self-administered postal survey questionnaire on use of medication by retirees 60 years or older under the Brazilian Social Security System, in Belo Horizonte, Minas Gerais State, in 2003. Questionnaires were sent in two rounds to 800 postal addresses of subjects selected by simple random sampling. The response rate was 47.8%, and there were no significant differences in the selected characteristics between respondents and non-respondents, or between early and late respondents. For almost all socio-demographic and health variables, item omission was less than or equal to 5% for both the entire sample and early or late responders
Jessen, Annika; Ho, Andrew D; Corrales, C Eduardo; Yueh, Bevan; Shin, Jennifer J
Objectives (1) To assess the 11-item Inner Effectiveness of Auditory Rehabilitation (Inner EAR) instrument with item response theory (IRT). (2) To determine whether the underlying latent ability could also be accurately represented by a subset of the items for use in high-volume clinical scenarios. (3) To determine whether the Inner EAR instrument correlates with pure tone thresholds and word recognition scores. Design IRT evaluation of prospective cohort data. Setting Tertiary care academic ambulatory otolaryngology clinic. Subjects and Methods Modern psychometric methods, including factor analysis and IRT, were used to assess unidimensionality and item properties. Regression methods were used to assess prediction of word recognition and pure tone audiometry scores. Results The Inner EAR scale is unidimensional, and items varied in their location and information. Information parameter estimates ranged from 1.63 to 4.52, with higher values indicating more useful items. The IRT model provided a basis for identifying 2 sets of items with relatively lower information parameters. Item information functions demonstrated which items added insubstantial value over and above other items and were removed in stages, creating a 8- and 3-item Inner EAR scale for more efficient assessment. The 8-item version accurately reflected the underlying construct. All versions correlated moderately with word recognition scores and pure tone averages. Conclusion The 11-, 8-, and 3-item versions of the Inner EAR scale have strong psychometric properties, and there is correlational validity evidence for the observed scores. Modern psychometric methods can help streamline care delivery by maximizing relevant information per item administered.
Fox, J.P.; Entink, R.K.; Avetisyan, M.
Randomized response (RR) models are often used for analysing univariate randomized response data and measuring population prevalence of sensitive behaviours. There is much empirical support for the belief that RR methods improve the cooperation of the respondents. Recently, RR models have been
Describes a method for assessing the quality of translations based on item response theory (IRT). Results from the IRT technique with French and Chinese versions of a scale measuring individualism-collectivism for samples of 250 U.S., 357 French, and 290 Chinese undergraduates show how several biased items are detected. (SLD)
Kohli, Nidhi; Koran, Jennifer; Henn, Lisa
There are well-defined theoretical differences between the classical test theory (CTT) and item response theory (IRT) frameworks. It is understood that in the CTT framework, person and item statistics are test- and sample-dependent. This is not the perception with IRT. For this reason, the IRT framework is considered to be theoretically superior…
Problem: Practitioners working with multiple-choice tests have long utilized Item Response Theory (IRT) models to evaluate the performance of test items for quality assurance. The use of similar applications for performance tests, however, is often encumbered due to the challenges encountered in working with complicated data sets in which local…
Euzébio Batista, Marcelo Henrique; Victória Barbosa, Jorge Luis; da Rosa Tavares, João Elison; Hackenhaar, Jonathan Luis
This article shows the application of Item Response Theory (IRT) for educational evaluation using games. The article proposes a computational model to create user profiles, called Psychometric Profile Generator (PPG). PPG uses the IRT mathematical model for exploring the levels of skills and behaviors in the form of items and/or stimuli. The model…
Falk, Carl F.; Cai, Li
We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…
Learning progressions are used to describe how students' understanding of a topic progresses over time and to classify the progress of students into steps or levels. This study applies Item Response Theory (IRT) based methods to investigate how to design learning progression-based science assessments. The research questions of this study are: (1) how to use items in different formats to classify students into levels on the learning progression, (2) how to design a test to give good information about students' progress through the learning progression of a particular construct and (3) what characteristics of test items support their use for assessing students' levels. Data used for this study were collected from 1500 elementary and secondary school students during 2009--2010. The written assessment was developed in several formats such as the Constructed Response (CR) items, Ordered Multiple Choice (OMC) and Multiple True or False (MTF) items. The followings are the main findings from this study. The OMC, MTF and CR items might measure different components of the construct. A single construct explained most of the variance in students' performances. However, additional dimensions in terms of item format can explain certain amount of the variance in student performance. So additional dimensions need to be considered when we want to capture the differences in students' performances on different types of items targeting the understanding of the same underlying progression. Items in each item format need to be improved in certain ways to classify students more accurately into the learning progression levels. This study establishes some general steps that can be followed to design other learning progression-based tests as well. For example, first, the boundaries between levels on the IRT scale can be defined by using the means of the item thresholds across a set of good items. Second, items in multiple formats can be selected to achieve the information criterion at all
Item response theory (IRT) methodology allowed an in-depth examination of several issues that would be difficult to explore using traditional methodology. IRT models were estimated for 4 risky-choice items, answered by students under either a gain or loss frame. Results supported the typical framing finding of risk-aversion for gains and risk-seeking for losses but also suggested that a latent construct we label preference for risk was influential in predicting risky choice. Also, the Asian Disease item, most often used in framing research, was found to have anomalous statistical properties when compared to other framing items. Copyright 1998 Academic Press.
Shou, Yiyun; Sellbom, Martin; Xu, Jing
There is cumulative evidence for the cross-cultural validity of the Triarchic Psychopathy Measure (TriPM; Patrick, 2010) among non-Western populations. Recent studies using correlational and regression analyses show promising construct validity of the TriPM in Chinese samples. However, little is known about the efficiency of items in TriPM in assessing the proposed latent traits. The current study evaluated the psychometric properties of the Chinese TriPM at the item level using item response theory analyses. It also examined the measurement invariance of the TriPM between the Chinese and the U.S. student samples by applying differential item functioning analyses under the item response theory framework. The results supported the unidimensional nature of the Disinhibition and Meanness scales. Both scales had a greater level of precision in the respective underlying constructs at the positive ends. The two scales, however, had several items that were weakly associated with their respective latent traits in the Chinese student sample. Boldness, on the other hand, was found to be multidimensional, and reflected a more normally distributed range of variation. The examination of measurement bias via differential item functioning analyses revealed that a number of items of the TriPM were not equivalent across the Chinese and the U.S. Some modification and adaptation of items might be considered for improving the precision of the TriPM for Chinese participants. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Eichenbaum, Alexander E; Marcus, David K; French, Brian F
This study examined item and scale functioning in the Psychopathic Personality Inventory-Revised (PPI-R) using an item response theory analysis. PPI-R protocols from 1,052 college student participants (348 male, 704 female) were analyzed. Analyses were conducted on the 131 self-report items comprising the PPI-R's eight content scales, using a graded response model. Scales collected a majority of their information about respondents possessing higher than average levels of the traits being measured. Each scale contained at least some items that evidenced limited ability to differentiate between respondents with differing levels of the trait being measured. Moreover, 80 items (61.1%) yielded significantly different responses between men and women presumably possessing similar levels of the trait being measured. Item performance was also influenced by the scoring format (directly scored vs. reverse-scored) of the items. Overall, the results suggest that the PPI-R, despite identifying psychopathic personality traits in individuals possessing high levels of those traits, may not identify these traits equally well for men and women, and scores are likely influenced by the scoring format of the individual item and scale.
Gifford, Katherine A; Liu, Dandan; Romano, Raymond; Jones, Richard N; Jefferson, Angela L
Subjective cognitive decline (SCD) may indicate unhealthy cognitive changes, but no standardized SCD measurement exists. This pilot study aims to identify reliable SCD questions. 112 cognitively normal (NC, 76±8 years, 63% female), 43 mild cognitive impairment (MCI; 77±7 years, 51% female), and 33 diagnostically ambiguous participants (79±9 years, 58% female) were recruited from a research registry and completed 57 self-report SCD questions. Psychometric methods were used for item-reduction. Factor analytic models assessed unidimensionality of the latent trait (SCD); 19 items were removed with extreme response distribution or trait-fit. Item response theory (IRT) provided information about question utility; 17 items with low information were dropped. Post-hoc simulation using computerized adaptive test (CAT) modeling selected the most commonly used items (n=9 of 21 items) that represented the latent trait well (r=0.94) and differentiated NC from MCI participants (F(1,146)=8.9, p=0.003). Item response theory and computerized adaptive test modeling identified nine reliable SCD items. This pilot study is a first step toward refining SCD assessment in older adults. Replication of these findings and validation with Alzheimer's disease biomarkers will be an important next step for the creation of a SCD screener.
Jabrayilov, Ruslan; Emons, Wilco H. M.; Sijtsma, Klaas
Clinical psychologists are advised to assess clinical and statistical significance when assessing change in individual patients. Individual change assessment can be conducted using either the methodologies of classical test theory (CTT) or item response theory (IRT). Researchers have been optimistic
Gomes, Raquel Regina de Freitas Magalhães; Batista, José Rodrigues; Ceccato, Maria das Graças Braga; Kerr, Lígia Regina Franco Sansigolo; Guimarães, Mark Drew Crosland
To evaluate the level of HIV/AIDS knowledge among men who have sex with men in Brazil using the latent trait model estimated by Item Response Theory. Multicenter, cross-sectional study, carried out in ten Brazilian cities between 2008 and 2009. Adult men who have sex with men were recruited (n = 3,746) through Respondent Driven Sampling. HIV/AIDS knowledge was ascertained through ten statements by face-to-face interview and latent scores were obtained through two-parameter logistic modeling (difficulty and discrimination) using Item Response Theory. Differential item functioning was used to examine each item characteristic curve by age and schooling. Overall, the HIV/AIDS knowledge scores using Item Response Theory did not exceed 6.0 (scale 0-10), with mean and median values of 5.0 (SD = 0.9) and 5.3, respectively, with 40.7% of the sample with knowledge levels below the average. Some beliefs still exist in this population regarding the transmission of the virus by insect bites, by using public restrooms, and by sharing utensils during meals. With regard to the difficulty and discrimination parameters, eight items were located below the mean of the scale and were considered very easy, and four items presented very low discrimination parameter (items contributed to the inaccuracy of the measurement of knowledge among those with median level and above. Item Response Theory analysis, which focuses on the individual properties of each item, allows measures to be obtained that do not vary or depend on the questionnaire, which provides better ascertainment and accuracy of knowledge scores. Valid and reliable scales are essential for monitoring HIV/AIDS knowledge among the men who have sex with men population over time and in different geographic regions, and this psychometric model brings this advantage.
Costa, Daniel S J; Asghari, Ali; Nicholas, Michael K
The Pain Self-Efficacy Questionnaire (PSEQ) is a 10-item instrument designed to assess the extent to which a person in pain believes s/he is able to accomplish various activities despite their pain. There is strong evidence for the validity and reliability of both the full-length PSEQ and a 2-item version. The purpose of this study is to further examine the properties of the PSEQ using an item response theory (IRT) approach. We used the two-parameter graded response model to examine the category probability curves, and location and discrimination parameters of the 10 PSEQ items. In item response theory, responses to a set of items are assumed to be probabilistically determined by a latent (unobserved) variable. In the graded-response model specifically, item response threshold (the value of the latent variable for which adjacent response categories are equally likely) and discrimination parameters are estimated for each item. Participants were 1511 mixed, chronic pain patients attending for initial assessment at a tertiary pain management centre. All items except item 7 ('I can cope with my pain without medication') performed well in IRT analysis, and the category probability curves suggested that participants used the 7-point response scale consistently. Items 6 ('I can still do many of the things I enjoy doing, such as hobbies or leisure activity, despite pain'), 8 ('I can still accomplish most of my goals in life, despite the pain') and 9 ('I can live a normal lifestyle, despite the pain') captured higher levels of the latent variable with greater precision. The results from this IRT analysis add to the body of evidence based on classical test theory illustrating the strong psychometric properties of the PSEQ. Despite the relatively poor performance of Item 7, its clinical utility warrants its retention in the questionnaire. The strong psychometric properties of the PSEQ support its use as an effective tool for assessing self-efficacy in people with pain
Betancourt, Theresa S; Yang, Frances; Bolton, Paul; Normand, Sharon-Lise
This study aimed to refine a dimensional scale for measuring psychosocial adjustment in African youth using item response theory (IRT). A 60-item scale derived from qualitative data was administered to 667 war-affected adolescents (55% female). Exploratory factor analysis (EFA) determined the dimensionality of items based on goodness-of-fit indices. Items with loadings less than 0.4 were dropped. Confirmatory factor analysis (CFA) was used to confirm the scale's dimensionality found under the EFA. Item discrimination and difficulty were estimated using a graded response model for each subscale using weighted least squares means and variances. Predictive validity was examined through correlations between IRT scores (θ) for each subscale and ratings of functional impairment. All models were assessed using goodness-of-fit and comparative fit indices. Fisher's Information curves examined item precision at different underlying ranges of each trait. Original scale items were optimized and reconfigured into an empirically-robust 41-item scale, the African Youth Psychosocial Assessment (AYPA). Refined subscales assess internalizing and externalizing problems, prosocial attitudes/behaviors and somatic complaints without medical cause. The AYPA is a refined dimensional assessment of emotional and behavioral problems in African youth with good psychometric properties. Validation studies in other cultures are recommended. Copyright © 2014 John Wiley & Sons, Ltd.
all Soldiers. The BHNAS and MHAT surveys have yielded valuable information regarding the effects of combat and deployment on service members...and Barriers to Care • Amount of Sleep and Sleep Deficit • Sleep Difficulties • Military Specialty • Positive Effects of Assignment • Contribution...nonopioid prescription painkillers was added; (3) the definition of “constantly and frequent” was omitted in the question; and (4) the NUBHNAS
Ali, Usama S.; Chang, Hua-Hua; Anderson, Carolyn J.
Polytomous items are typically described by multiple category-related parameters; situations, however, arise in which a single index is needed to describe an item's location along a latent trait continuum. Situations in which a single index would be needed include item selection in computerized adaptive testing or test assembly. Therefore single…
This report presents the results of Phase I cultural resources survey and archeological inventory of two marine and 11 terrestrial project items on and near Marsh Island in Iberia Parish, Louisiana...
Pasca, Laura; Aragonés, Juan I; Coello, María T
The Connectedness to Nature Scale (CNS) is used as a measure of the subjective cognitive connection between individuals and nature. However, to date, it has not been analyzed at the item level to confirm its quality. In the present study, we conduct such an analysis based on Item Response Theory. We employed data from previous studies using the Spanish-language version of the CNS, analyzing a sample of 1008 participants. The results show that seven items presented appropriate indices of discrimination and difficulty, in addition to a good fit. The remaining six have inadequate discrimination indices and do not present a good fit. A second study with 321 participants shows that the seven-item scale has adequate levels of reliability and validity. Therefore, it would be appropriate to use a reduced version of the scale after eliminating the items that display inappropriate behavior, since they may interfere with research results on connectedness to nature.
Knol Dirk L
Full Text Available Abstract Background For the Low Vision Quality Of Life questionnaire (LVQOL it is unknown whether the psychometric properties are satisfactory when an item response theory (IRT perspective is considered. This study evaluates some essential psychometric properties of the LVQOL questionnaire in an IRT model, and investigates differential item functioning (DIF. Methods Cross-sectional data were used from an observational study among visually-impaired patients (n = 296. Calibration was performed for every dimension of the LVQOL in the graded response model. Item goodness-of-fit was assessed with the S-X2-test. DIF was assessed on relevant background variables (i.e. age, gender, visual acuity, eye condition, rehabilitation type and administration type with likelihood-ratio tests for DIF. The magnitude of DIF was interpreted by assessing the largest difference in expected scores between subgroups. Measurement precision was assessed by presenting test information curves; reliability with the index of subject separation. Results All items of the LVQOL dimensions fitted the model. There was significant DIF on several items. For two items the maximum difference between expected scores exceeded one point, and DIF was found on multiple relevant background variables. Item 1 'Vision in general' from the "Adjustment" dimension and item 24 'Using tools' from the "Reading and fine work" dimension were removed. Test information was highest for the "Reading and fine work" dimension. Indices for subject separation ranged from 0.83 to 0.94. Conclusions The items of the LVQOL showed satisfactory item fit to the graded response model; however, two items were removed because of DIF. The adapted LVQOL with 21 items is DIF-free and therefore seems highly appropriate for use in heterogeneous populations of visually impaired patients.
Svicher, Andrea; Cosci, Fiammetta; Giannini, Marco; Pistelli, Francesco; Fagerström, Karl
The Fagerström Test for Cigarette Dependence (FTCD) and the Heaviness of Smoking Index (HSI) are the gold standard measures to assess cigarette dependence. However, FTCD reliability and factor structure have been questioned and HSI psychometric properties are in need of further investigations. The present study examined the psychometrics properties of the FTCD and the HSI via the Item Response Theory. The study was a secondary analysis of data collected in 862 Italian daily smokers. Confirmatory factor analysis was run to evaluate the dimensionality of FTCD. A Grade Response Model was applied to FTCD and HSI to verify the fit to the data. Both item and test functioning were analyzed and item statistics, Test Information Function, and scale reliabilities were calculated. Mokken Scale Analysis was applied to estimate homogeneity and Loevinger's coefficients were calculated. The FTCD showed unidimensionality and homogeneity for most of the items and for the total score. It also showed high sensitivity and good reliability from medium to high levels of cigarette dependence, although problems related to some items (i.e., items 3 and 5) were evident. HSI had good homogeneity, adequate item functioning, and high reliability from medium to high levels of cigarette dependence. Significant Differential Item Functioning was found for items 1, 4, 5 of the FTCD and for both items of HSI. HSI seems highly recommended in clinical settings addressed to heavy smokers while FTCD would be better used in smokers with a level of cigarette dependence ranging between low and high. Copyright © 2017 Elsevier Ltd. All rights reserved.
Tian, Wei; Cai, Li; Thissen, David; Xin, Tao
In item response theory (IRT) modeling, the item parameter error covariance matrix plays a critical role in statistical inference procedures. When item parameters are estimated using the EM algorithm, the parameter error covariance matrix is not an automatic by-product of item calibration. Cai proposed the use of Supplemented EM algorithm for…
Thomas, Michael L; Brown, Gregory G; Gur, Ruben C; Moore, Tyler M; Patt, Virginie M; Risbrough, Victoria B; Baker, Dewleen G
Models from signal detection theory are commonly used to score neuropsychological test data, especially tests of recognition memory. Here we show that certain item response theory models can be formulated as signal detection theory models, thus linking two complementary but distinct methodologies. We then use the approach to evaluate the validity (construct representation) of commonly used research measures, demonstrate the impact of conditional error on neuropsychological outcomes, and evaluate measurement bias. Signal detection-item response theory (SD-IRT) models were fitted to recognition memory data for words, faces, and objects. The sample consisted of U.S. Infantry Marines and Navy Corpsmen participating in the Marine Resiliency Study. Data comprised item responses to the Penn Face Memory Test (PFMT; N = 1,338), Penn Word Memory Test (PWMT; N = 1,331), and Visual Object Learning Test (VOLT; N = 1,249), and self-report of past head injury with loss of consciousness. SD-IRT models adequately fitted recognition memory item data across all modalities. Error varied systematically with ability estimates, and distributions of residuals from the regression of memory discrimination onto self-report of past head injury were positively skewed towards regions of larger measurement error. Analyses of differential item functioning revealed little evidence of systematic bias by level of education. SD-IRT models benefit from the measurement rigor of item response theory-which permits the modeling of item difficulty and examinee ability-and from signal detection theory-which provides an interpretive framework encompassing the experimentally validated constructs of memory discrimination and response bias. We used this approach to validate the construct representation of commonly used research measures and to demonstrate how nonoptimized item parameters can lead to erroneous conclusions when interpreting neuropsychological test data. Future work might include the
Cho, Sun-Joo; Wilmer, Jeremy; Herzmann, Grit; McGugin, Rankin; Fiset, Daniel; Van Gulick, Ana E.; Ryan, Katie; Gauthier, Isabel
We evaluated the psychometric properties of the Cambridge face memory test (CFMT; Duchaine & Nakayama, 2006). First, we assessed the dimensionality of the test with a bi-factor exploratory factor analysis (EFA). This EFA analysis revealed a general factor and three specific factors clustered by targets of CFMT. However, the three specific factors appeared to be minor factors that can be ignored. Second, we fit a unidimensional item response model. This item response model showed that the CFMT items could discriminate individuals at different ability levels and covered a wide range of the ability continuum. We found the CFMT to be particularly precise for a wide range of ability levels. Third, we implemented item response theory (IRT) differential item functioning (DIF) analyses for each gender group and two age groups (Age ≤ 20 versus Age > 21). This DIF analysis suggested little evidence of consequential differential functioning on the CFMT for these groups, supporting the use of the test to compare older to younger, or male to female, individuals. Fourth, we tested for a gender difference on the latent facial recognition ability with an explanatory item response model. We found a significant but small gender difference on the latent ability for face recognition, which was higher for women than men by 0.184, at age mean 23.2, controlling for linear and quadratic age effects. Finally, we discuss the practical considerations of the use of total scores versus IRT scale scores in applications of the CFMT. PMID:25642930
Pilcher, June J; Switzer, Fred S; Munc, Alec; Donnelly, Janet; Jellen, Julia C; Lamm, Claus
The purpose of this study is to examine the psychometric properties of the Epworth Sleepiness Scale (ESS) in two languages, German and English. Students from a university in Austria (N = 292; 55 males; mean age = 18.71 ± 1.71 years; 237 females; mean age = 18.24 ± 0.88 years) and a university in the US (N = 329; 128 males; mean age = 18.71 ± 0.88 years; 201 females; mean age = 21.59 ± 2.27 years) completed the ESS. An exploratory-factor analysis was completed to examine dimensionality of the ESS. Item response theory (IRT) analyses were used to provide information about the response rates on the items on the ESS and provide differential item functioning (DIF) analyses to examine whether the items were interpreted differently between the two languages. The factor analyses suggest that the ESS measures two distinct sleepiness constructs. These constructs indicate that the ESS is probing sleepiness in settings requiring active versus passive responding. The IRT analyses found that overall, the items on the ESS perform well as a measure of sleepiness. However, Item 8 and to a lesser extent Item 6 were being interpreted differently by respondents in comparison to the other items. In addition, the DIF analyses showed that the responses between German and English were very similar indicating that there are only minor measurement differences between the two language versions of the ESS. These findings suggest that the ESS provides a reliable measure of propensity to sleepiness; however, it does convey a two-factor approach to sleepiness. Researchers and clinicians can use the German and English versions of the ESS but may wish to exclude Item 8 when calculating a total sleepiness score.
Cappelleri, Joseph C.; Lundy, J. Jason; Hays, Ron D.
Introduction The U.S. Food and Drug Administration’s patient-reported outcome (PRO) guidance document defines content validity as “the extent to which the instrument measures the concept of interest” (FDA, 2009, p. 12). “Construct validity is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity” (Strauss & Smith, 2009, p. 7). Hence both qualitative and quantitative information are essential in evaluating the validity of measures. Methods We review classical test theory and item response theory approaches to evaluating PRO measures including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized “difficulty” (severity) order of items is represented by observed responses. Conclusion Classical test theory and item response theory can be useful in providing a quantitative assessment of items and scales during the content validity phase of patient-reported outcome measures. Depending on the particular type of measure and the specific circumstances, either one or both approaches should be considered to help maximize the content validity of PRO measures. PMID:24811753
Assessment is an integral part of society and education, and for this reason it is important to know what you measure. This thesis is about explanatory item response modelling of an abstract reasoning assessment, with the objective to create a modern test design framework for automatic generation of valid and precalibrated items of abstract reasoning. Modern test design aims to strengthen the connections between the different components of a test, with a stress on strong theory, systematic it...
Engelhard, Matthew M; Schmidt, Karen M; Engel, Casey E; Brenton, J Nicholas; Patek, Stephen D; Goldman, Myla D
The Multiple Sclerosis Walking Scale (MSWS-12) is the predominant patient-reported measure of multiple sclerosis (MS) -elated walking ability, yet it had not been analyzed using item response theory (IRT), the emerging standard for patient-reported outcome (PRO) validation. This study aims to reduce MSWS-12 measurement error and facilitate computerized adaptive testing by creating an IRT model of the MSWS-12 and distributing it online. MSWS-12 responses from 284 subjects with MS were collected by mail and used to fit and compare several IRT models. Following model selection and assessment, subpopulations based on age and sex were tested for differential item functioning (DIF). Model comparison favored a one-dimensional graded response model (GRM). This model met fit criteria and explained 87 % of response variance. The performance of each MSWS-12 item was characterized using category response curves (CRCs) and item information. IRT-based MSWS-12 scores correlated with traditional MSWS-12 scores (r = 0.99) and timed 25-foot walk (T25FW) speed (r = -0.70). Item 2 showed DIF based on age (χ 2 = 19.02, df = 5, p Item 11 showed DIF based on sex (χ 2 = 13.76, df = 5, p = 0.02). MSWS-12 measurement error depends on walking ability, but could be lowered by improving or replacing items with low information or DIF. The e-MSWS-12 includes IRT-based scoring, error checking, and an estimated T25FW derived from MSWS-12 responses. It is available at https://ms-irt.shinyapps.io/e-MSWS-12 .
KLAUS D. KUBINGER
Full Text Available Multiple choice response formats are problematical as an item is often scored as solved simply because the test-taker is a lucky guesser. Instead of applying pertinent IRT models which take guessing effects into account, a pragmatic approach of re-conceptualizing multiple choice response formats to reduce the chance of lucky guessing is considered. This paper compares the free response format with two different multiple choice formats. A common multiple choice format with a single correct response option and five distractors (“1 of 6” is used, as well as a multiple choice format with five response options, of which any number of the five is correct and the item is only scored as mastered if all the correct response options and none of the wrong ones are marked (“x of 5”. An experiment was designed, using pairs of items with exactly the same content but different response formats. 173 test-takers were randomly assigned to two test booklets of 150 items altogether. Rasch model analyses adduced a fitting item pool, after the deletion of 39 items. The resulting item difficulty parameters were used for the comparison of the different formats. The multiple choice format “1 of 6” differs significantly from “x of 5”, with a relative effect of 1.63, while the multiple choice format “x of 5” does not significantly differ from the free response format. Therefore, the lower degree of difficulty of items with the “1 of 6” multiple choice format is an indicator of relevant guessing effects. In contrast the “x of 5” multiple choice format can be seen as an appropriate substitute for free response format.
Kean, Jacob; Brodke, Darrel S; Biber, Joshua; Gross, Paul
Item response theory has its origins in educational measurement and is now commonly applied in health-related measurement of latent traits, such as function and symptoms. This application is due in large part to gains in the precision of measurement attributable to item response theory and corresponding decreases in response burden, study costs, and study duration. The purpose of this paper is twofold: introduce basic concepts of item response theory and demonstrate this analytic approach in a worked example, a Rasch model (1PL) analysis of the Eating Assessment Tool (EAT-10), a commonly used measure for oropharyngeal dysphagia. The results of the analysis were largely concordant with previous studies of the EAT-10 and illustrate for brain impairment clinicians and researchers how IRT analysis can yield greater precision of measurement.
Molenaar, Dylan; Tuerlinckx, Francis; van der Maas, Han L J
A generalized linear modeling framework to the analysis of responses and response times is outlined. In this framework, referred to as bivariate generalized linear item response theory (B-GLIRT), separate generalized linear measurement models are specified for the responses and the response times that are subsequently linked by cross-relations. The cross-relations can take various forms. Here, we focus on cross-relations with a linear or interaction term for ability tests, and cross-relations with a curvilinear term for personality tests. In addition, we discuss how popular existing models from the psychometric literature are special cases in the B-GLIRT framework depending on restrictions in the cross-relation. This allows us to compare existing models conceptually and empirically. We discuss various extensions of the traditional models motivated by practical problems. We also illustrate the applicability of our approach using various real data examples, including data on personality and cognitive ability.
LeBouthillier, Daniel M; Thibodeau, Michel A; Alberts, Nicole M; Hadjistavropoulos, Heather D; Asmundson, Gordon J G
Individuals with medical conditions are likely to have elevated health anxiety; however, research has not demonstrated how medical status impacts response patterns on health anxiety measures. Measurement bias can undermine the validity of a questionnaire by overestimating or underestimating scores in groups of individuals. We investigated whether the Short Health Anxiety Inventory (SHAI), a widely-used measure of health anxiety, exhibits medical condition-based bias on item and subscale levels, and whether the SHAI subscales adequately assess the health anxiety continuum. Data were from 963 individuals with diabetes, breast cancer, or multiple sclerosis, and 372 healthy individuals. Mantel-Haenszel tests and item characteristic curves were used to classify the severity of item-level differential item functioning in all three medical groups compared to the healthy group. Test characteristic curves were used to assess scale-level differential item functioning and whether the SHAI subscales adequately assess the health anxiety continuum. Nine out of 14 items exhibited differential item functioning. Two items exhibited differential item functioning in all medical groups compared to the healthy group. In both Thought Intrusion and Fear of Illness subscales, differential item functioning was associated with mildly deflated scores in medical groups with very high levels of the latent traits. Fear of Illness items poorly discriminated between individuals with low and very low levels of the latent trait. While individuals with medical conditions may respond differentially to some items, clinicians and researchers can confidently use the SHAI with a variety of medical populations without concern of significant bias. Copyright © 2015 Elsevier Inc. All rights reserved.
Full Text Available Abstract Background Children's health and health behaviour are essential for their development and it is important to obtain abundant and accurate information to understand young people's health and health behaviour. The Health Behaviour in School-aged Children (HBSC study is among the first large-scale international surveys on adolescent health through self-report questionnaires. So far, more than 40 countries in Europe and North America have been involved in the HBSC study. The purpose of this study is to assess the test-retest reliability of selected items in the Chinese version of the HBSC survey questionnaire in a sample of adolescents in Beijing, China. Methods A sample of 95 male and female students aged 11 or 15 years old participated in a test and retest with a three weeks interval. Student Identity numbers of respondents were utilized to permit matching of test-retest questionnaires. 23 items concerning physical activity, sedentary behaviour, sleep and substance use were evaluated by using the percentage of response shifts and the single measure Intraclass Correlation Coefficients (ICC with 95% confidence interval (CI for all respondents and stratified by gender and age. Items on substance use were only evaluated for school children aged 15 years old. Results The percentage of no response shift between test and retest varied from 32% for the item on computer use at weekends to 92% for the three items on smoking. Of all the 23 items evaluated, 6 items (26% showed a moderate reliability, 12 items (52% displayed a substantial reliability and 4 items (17% indicated almost perfect reliability. No gender and age group difference of the test-retest reliability was found except for a few items on sedentary behaviour. Conclusions The overall findings of this study suggest that most selected indicators in the HBSC survey questionnaire have satisfactory test-retest reliability for the students in Beijing. Further test-retest studies in a large
Trierweiller, Andréa Cristina; Peixe, Blênio César Severo; Tezza, Rafael; Pereira, Vera Lúcia Duarte do Valle; Pacheco, Waldemar; Bornia, Antonio Cezar; de Andrade, Dalton Francisco
The aim of this paper is to measure the effectiveness of the organizations Information and Communication Technology (ICT) from the point of view of the manager, using Item Response Theory (IRT). There is a need to verify the effectiveness of these organizations which are normally associated to complex, dynamic, and competitive environments. In academic literature, there is disagreement surrounding the concept of organizational effectiveness and its measurement. A construct was elaborated based on dimensions of effectiveness towards the construction of the items of the questionnaire which submitted to specialists for evaluation. It demonstrated itself to be viable in measuring organizational effectiveness of ICT companies under the point of view of a manager through using Two-Parameter Logistic Model (2PLM) of the IRT. This modeling permits us to evaluate the quality and property of each item placed within a single scale: items and respondents, which is not possible when using other similar tools.
Varughese, K.G.; Kumar, M.; George, Thomas; Sunder Rajan, P.; Vijay Kumar, B.; Rajan, M.P.
High background radiation are found in nature at some parts of Australia, Brazil, China, Iran, India etc. Kanyakumari district in the southern peninsular India is such a NHBRA (Natural high background radiation area) having monazite placers along the coast. Although general radiation levels in this area has been investigated by many researchers in the past, the impact of this high background radioactivity on the flora and fauna is scarce. In the present investigations radiation survey has been done at high background areas with special attention to vegetables and crops grown in this area. The studies are centered at the 2x1000 MWe, Kudankulam Nuclear Power Project site which is about 25 km from Kanyakumari. Samples of soil, sand, vegetations and other food items are collected from the 30 km radial zone of KKNPP site and analysed for naturally occurring radionuclides such as 238 U, 232 Th and 40 K. The intake of natural radioactivity through food items produced in this area is found to be very small, and the internal dose to general population staying at this high natural background area is insignificant. (author)
McGrory, Sarah; Shenkin, Susan D; Austin, Elizabeth J; Starr, John M
impairment of functional abilities represents a crucial component of dementia diagnosis. Current functional measures rely on the traditional aggregate method of summing raw scores. While this summary score provides a quick representation of a person's ability, it disregards useful information on the item level. to use item response theory (IRT) methods to increase the interpretive power of the Lawton Instrumental Activities of Daily Living (IADL) scale by establishing a hierarchy of item 'difficulty' and 'discrimination'. this cross-sectional study applied IRT methods to the analysis of IADL outcomes. Participants were 202 members of the Scottish Dementia Research Interest Register (mean age = 76.39, range = 56-93, SD = 7.89 years) with complete itemised data available. a Mokken scale with good reliability (Molenaar Sijtsama statistic 0.79) was obtained, satisfying the IRT assumption that the items comprise a single unidimensional scale. The eight items in the scale could be placed on a hierarchy of 'difficulty' (H coefficient = 0.55), with 'Shopping' being the most 'difficult' item and 'Telephone use' being the least 'difficult' item. 'Shopping' was the most discriminatory item differentiating well between patients of different levels of ability. IRT methods are capable of providing more information about functional impairment than a summed score. 'Shopping' and 'Telephone use' were identified as items that reveal key information about a patient's level of ability, and could be useful screening questions for clinicians. © The Author 2013. Published by Oxford University Press on behalf of the British Geriatrics Society. All rights reserved. For Permissions, please email: journals.permissions@ oup.com.
Full Text Available We used standard and multilevel models to assess the reliability of core items in the General Social Survey panel studies spanning 2006 to 2014. Most of the 293 core items scored well on the measure of reliability: 62 items (21 percent had reliability measures greater than 0.85; another 71 (24 percent had reliability measures between 0.70 and 0.85. Objective items, especially facts about demography and religion, were generally more reliable than subjective items. The economic recession of 2007–2009, the slow recovery afterward, and the election of Barack Obama in 2008 altered the social context in ways that may look like unreliability of items. For example, unemployment status, hours worked, and weeks worked have lower reliability than most work-related items, reflecting the consequences of the recession on the facts of peoples lives. Items regarding racial and gender discrimination and racial stereotypes scored as particularly unreliable, accounting for most of the 15 items with reliability coefficients less than 0.40. Our results allow scholars to more easily take measurement reliability into consideration in their own research, while also highlighting the limitations of these approaches.
Full Text Available The study investigated the invariance properties of one, two and three parame-ter logistic item response theory models. It examined the best fit among one parameter logistic (1PL, two-parameter logistic (2PL and three-parameter logistic (3PL IRT models for SSCE, 2008 in Mathematics. It also investigated the degree of invariance of the IRT models based item difficulty parameter estimates in SSCE in Mathematics across different samples of examinees and examined the degree of invariance of the IRT models based item discrimination estimates in SSCE in Mathematics across different samples of examinees. In order to achieve the set objectives, 6000 students (3000 males and 3000 females were drawn from the population of 35262 who wrote the 2008 paper 1 Senior Secondary Certificate Examination (SSCE in Mathematics organized by National Examination Council (NECO. The item difficulty and item discrimination parameter estimates from CTT and IRT were tested for invariance using BLOG MG 3 and correlation analysis was achieved using SPSS version 20. The research findings were that two parameter model IRT item difficulty and discrimination parameter estimates exhibited invariance property consistently across different samples and that 2-parameter model was suitable for all samples of examinees unlike one-parameter model and 3-parameter model.
Watanabe, Yusuke; Madani, Amin; Ito, Yoichi M; Bilgic, Elif; McKendy, Katherine M; Feldman, Liane S; Fried, Gerald M; Vassiliou, Melina C
The extent to which each item assessed using the Global Operative Assessment of Laparoscopic Skills (GOALS) contributes to the total score remains unknown. The purpose of this study was to evaluate the level of difficulty and discriminative ability of each of the 5 GOALS items using item response theory (IRT). A total of 396 GOALS assessments for a variety of laparoscopic procedures over a 12-year time period were included. Threshold parameters of item difficulty and discrimination power were estimated for each item using IRT. The higher slope parameters seen with "bimanual dexterity" and "efficiency" are indicative of greater discriminative ability than "depth perception", "tissue handling", and "autonomy". IRT psychometric analysis indicates that the 5 GOALS items do not demonstrate uniform difficulty and discriminative power, suggesting that they should not be scored equally. "Bimanual dexterity" and "efficiency" seem to have stronger discrimination. Weighted scores based on these findings could improve the accuracy of assessing individual laparoscopic skills. Copyright © 2016 Elsevier Inc. All rights reserved.
Pedersen, Eric R; Huang, Wenjing; Dvorak, Robert D; Prince, Mark A; Hummer, Justin F
Given recent state legislation legalizing marijuana for recreational purposes and majority popular opinion favoring these laws, we developed the Protective Behavioral Strategies for Marijuana scale (PBSM) to identify strategies that may mitigate the harms related to marijuana use among those young people who choose to use the drug. In the current study, we expand on the initial exploratory study of the PBSM to further validate the measure with a large and geographically diverse sample (N = 2,117; 60% women, 30% non-White) of college students from 11 different universities across the United States. We sought to develop a psychometrically sound item bank for the PBSM and to create a short assessment form that minimizes respondent burden and time. Quantitative item analyses, including exploratory and confirmatory factor analyses with item response theory (IRT) and evaluation of differential item functioning (DIF), revealed an item bank of 36 items that was examined for unidimensionality and good content coverage, as well as a short form of 17 items that is free of bias in terms of gender (men vs. women), race (White vs. non-White), ethnicity (Hispanic vs. non-Hispanic), and recreational marijuana use legal status (state recreational marijuana was legal for 25.5% of participants). We also provide a scoring table for easy transformation from sum scores to IRT scale scores. The PBSM item bank and short form associated strongly and negatively with past month marijuana use and consequences. The measure may be useful to researchers and clinicians conducting intervention and prevention programs with young adults. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Full Text Available Abstract Background Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Methods Scalability of data from 1 a cross-sectional health survey (the Scottish Health Education Population Survey and 2 a general population birth cohort study (the National Child Development Study illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. Results and conclusions After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items we show that all items from the 12-item General Health Questionnaire (GHQ-12 – when binary scored – were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech’s “well-being” and “distress” clinical scales. An illustration of ordinal item analysis
Stochl, Jan; Jones, Peter B; Croudace, Tim J
Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related) Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Scalability of data from 1) a cross-sectional health survey (the Scottish Health Education Population Survey) and 2) a general population birth cohort study (the National Child Development Study) illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items) we show that all items from the 12-item General Health Questionnaire (GHQ-12)--when binary scored--were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech's "well-being" and "distress" clinical scales). An illustration of ordinal item analysis confirmed that all 14 positively worded items of the Warwick-Edinburgh Mental
Emons, Wilco H.M.; Meijer, R.R.; Denollet, Johan
Objective: Individuals with increased levels of both negative affectivity (NA) and social inhibition (SI)—referred to as type-D personality—are at increased risk of adverse cardiac events. We used item response theory (IRT) to evaluate NA, SI, and type-D personality as measured by the DS14. The
Keuning, J.; Verhoeven, L.T.W.
The purpose of the present study was to explore whether the Item Response Theory (IRT) provides a suitable framework to screen for word reading and spelling problems during the elementary school period. The following issues were addressed from an IRT perspective: (a) the dimensionality of word
Chalmers, R. Philip
A mixed-effects item response theory (IRT) model is presented as a logical extension of the generalized linear mixed-effects modeling approach to formulating explanatory IRT models. Fixed and random coefficients in the extended model are estimated using a Metropolis-Hastings Robbins-Monro (MH-RM) stochastic imputation algorithm to accommodate for…
The usual role of a discussant is to clarify and correct the paper being discussed, but in this case, the author, Howard Wainer, generally agrees with everything David Thissen says in his essay, "Bad Questions: An Essay Involving Item Response Theory." This essay expands on David Thissen's statement that there are typically two principal…
In item response theory (IRT), mathematical models are applied to analyze data from tests and questionnaires used to measure abilities, proficiency, personality traits and attitudes. This thesis is concerned with comparison of subjects, students and schools based on average examination grades using
Peeraer, Jef; Van Petegem, Peter
This research describes the development and validation of an instrument to measure integration of Information and Communication Technology (ICT) in education. After literature research on definitions of integration of ICT in education, a comparison is made between the classical test theory and the item response modeling approach for the…
Fox, Gerardus J.A.; Glas, Cornelis A.W.
This paper focuses on handling measurement error in predictor variables using item response theory (IRT). Measurement error is of great important in assessment of theoretical constructs, such as intelligence or the school climate. Measurement error is modeled by treating the predictors as unobserved
Glas, Cees A. W.; Meijer, Rob R.
A Bayesian approach to the evaluation of person fit in item response theory (IRT) models is presented. In a posterior predictive check, the observed value on a discrepancy variable is positioned in its posterior distribution. In a Bayesian framework, a Markov Chain Monte Carlo procedure can be used to generate samples of the posterior distribution…
Gordon, Rachel A.
This article provides family scientists with an understanding of contemporary measurement perspectives and the ways in which item response theory (IRT) can be used to develop measures with desired evidence of precision and validity for research uses. The article offers a nontechnical introduction to some key features of IRT, including its…
Pan, Tianshu; Yin, Yue
In this article, we propose using the Bayes factors (BF) to evaluate person fit in item response theory models under the framework of Bayesian evaluation of an informative diagnostic hypothesis. We first discuss the theoretical foundation for this application and how to analyze person fit using BF. To demonstrate the feasibility of this approach,…
Doebler, Anna; Doebler, Philipp; Holling, Heinz
The common way to calculate confidence intervals for item response theory models is to assume that the standardized maximum likelihood estimator for the person parameter [theta] is normally distributed. However, this approximation is often inadequate for short and medium test lengths. As a result, the coverage probabilities fall below the given…
Sinharay, Sandip; Haberman, Shelby J.
Standard 3.9 of the Standards for Educational and Psychological Testing ([, 1999]) demands evidence of model fit when item response theory (IRT) models are employed to data from tests. Hambleton and Han ([Hambleton, R. K., 2005]) and Sinharay ([Sinharay, S., 2005]) recommended the assessment of practical significance of misfit of IRT models, but…
Horzum, Mehmet Baris; Uyanik, Gülden Kaya
The aim of this study is to examine validity and reliability of Community of Inquiry Scale commonly used in online learning by the means of Item Response Theory. For this purpose, Community of Inquiry Scale version 14 is applied on 1,499 students of a distance education center's online learning programs at a Turkish state university via internet.…
Novakovic, A.M.; Krekels, E.H.; Munafo, A.; Ueckert, S.; Karlsson, M.O.
In this study, we report the development of the first item response theory (IRT) model within a pharmacometrics framework to characterize the disease progression in multiple sclerosis (MS), as measured by Expanded Disability Status Score (EDSS). Data were collected quarterly from a 96-week phase III
van Schuur, Wijbrandt H.
This article introduces a model of ordinal unidimensional measurement known as Mokken scale analysis. Mokken scaling is based on principles of Item Response Theory (IRT) that originated in the Guttman scale. I compare the Mokken model with both Classical Test Theory (reliability or factor analysis)
Van der Elst, Wim; Ouwehand, Carolijn; van Rijn, Peter; Lee, Nikki; Van Boxtel, Martin; Jolles, Jelle
The purpose of the present study was to evaluate the psychometric properties of a shortened version of the Raven Standard Progressive Matrices (SPM) under an item response theory framework (the one- and two-parameter logistic models). The shortened Raven SPM was administered to N = 453 cognitively healthy adults aged between 24 and 83 years. The…
Tsutsumi, A.; Iwata, N.; Watanabe, N.; Jonge, de J.; Pikhart, H.; Férnandez-López, J.A.; Xu, Liying; Peter, R.; Knutsson, A.; Niedhammer, I.; Kawakami, N.; Siegrist, J.
Our objective was to examine cross-cultural comparability of standard scales of the Effort-Reward Imbalance occupational stress scales by item response theory (IRT) analyses. Data were from 20,256 Japanese employees, 1464 Dutch nurses and nurses' aides, 2128 representative employees from
item. EMI detectors are operated in either time domain (TD) or frequency domain (FD). A common example of a hand-held EMI sensor is the 3-1 metal ...held Digital Emerging Fisher 1266-X Metal Detector Hand-held Analog Established Foerster MINEX 2FD 4.500 Hand-held Analog Established Minelab...ferrous and nonferrous metallic objects. • Effective in detecting near-surface objects. • Can be effective in geology that challenges magnetometers
Ramdath, D Dan; Wolever, Thomas M S; Siow, Yaw Chris; Ryland, Donna; Hawke, Aileen; Taylor, Carla; Zahradka, Peter; Aliani, Michel
The consumption of pulses is associated with many health benefits. This study assessed post-prandial blood glucose response (PPBG) and the acceptability of food items containing green lentils. In human trials we: (i) defined processing methods (boiling, pureeing, freezing, roasting, spray-drying) that preserve the PPBG-lowering feature of lentils; (ii) used an appropriate processing method to prepare lentil food items, and compared the PPBG and relative glycemic responses (RGR) of lentil and control foods; and (iii) conducted consumer acceptability of the lentil foods. Eight food items were formulated from either whole lentil puree (test) or instant potato (control). In separate PPBG studies, participants consumed fixed amounts of available carbohydrates from test foods, control foods, or a white bread standard. Finger prick blood samples were obtained at 0, 15, 30, 45, 60, 90, and 120 min after the first bite, analyzed for glucose, and used to calculate incremental area under the blood glucose response curve and RGR; glycemic index (GI) was measured only for processed lentils. Mean GI (± standard error of the mean) of processed lentils ranged from 25 ± 3 (boiled) to 66 ± 6 (spray-dried); the GI of spray-dried lentils was significantly ( p roasted lentil. Overall, lentil-based food items all elicited significantly lower RGR compared to potato-based items (40 ± 3 vs. 73 ± 3%; p chicken, chicken pot pie, and lemony parsley soup had the highest overall acceptability corresponding to "like slightly" to "like moderately". Processing influenced the PPBG of lentils, but food items formulated from lentil puree significantly attenuated PPBG. Formulation was associated with significant differences in sensory attributes.
D. Dan Ramdath
Full Text Available The consumption of pulses is associated with many health benefits. This study assessed post-prandial blood glucose response (PPBG and the acceptability of food items containing green lentils. In human trials we: (i defined processing methods (boiling, pureeing, freezing, roasting, spray-drying that preserve the PPBG-lowering feature of lentils; (ii used an appropriate processing method to prepare lentil food items, and compared the PPBG and relative glycemic responses (RGR of lentil and control foods; and (iii conducted consumer acceptability of the lentil foods. Eight food items were formulated from either whole lentil puree (test or instant potato (control. In separate PPBG studies, participants consumed fixed amounts of available carbohydrates from test foods, control foods, or a white bread standard. Finger prick blood samples were obtained at 0, 15, 30, 45, 60, 90, and 120 min after the first bite, analyzed for glucose, and used to calculate incremental area under the blood glucose response curve and RGR; glycemic index (GI was measured only for processed lentils. Mean GI (± standard error of the mean of processed lentils ranged from 25 ± 3 (boiled to 66 ± 6 (spray-dried; the GI of spray-dried lentils was significantly (p < 0.05 higher than boiled, pureed, or roasted lentil. Overall, lentil-based food items all elicited significantly lower RGR compared to potato-based items (40 ± 3 vs. 73 ± 3%; p < 0.001. Apricot chicken, chicken pot pie, and lemony parsley soup had the highest overall acceptability corresponding to “like slightly” to “like moderately”. Processing influenced the PPBG of lentils, but food items formulated from lentil puree significantly attenuated PPBG. Formulation was associated with significant differences in sensory attributes.
Pedersen, Mogens Jin; Nielsen, Christian Videbæk
Identifying ways to efficiently maximize the response rate to surveys is important to survey-based research. However, evidence on the response rate effect of donation incentives and especially altruistic and egotistic-type text appeal interventions is sparse and ambiguous. By a randomized survey...... experiment among 6,162 members of an online survey panel, this article shows how low-cost incentives and cost-free text appeal interventions may impact the survey response rate in online panels. The experimental treatments comprise (a) a cash prize lottery incentive, (b) two donation incentives equating...... survey response with a monetary donation to a good cause, (c) an egotistic-type text appeal, and (d) an altruistic-type text appeal. Relative to a control group, we find higher response rates among the recipients of the egotistic-type text appeal and the lottery incentive. Donation incentives yield lower...
Full Text Available Abstract Background This article describes the development and validation of a self-reported questionnaire, the KQoL-26, that is based on the views of patients with a suspected ligamentous or meniscal injury of the knee that assesses the impact of their knee problem on the quality of their lives. Methods Patient interviews and focus groups were used to derive questionnaire content. The instrument was assessed for data quality, reliability, validity, and responsiveness using data from a randomised trial and patient survey about general practitioners' use of Magnetic Resonance Imaging for patients with a suspected ligamentous or meniscal injury. Results Interview and focus group data produced a 40-item questionnaire designed for self-completion. 559 trial patients and 323 survey patients responded to the questionnaire. Following principal components analysis and Rasch analysis, 26 items were found to contribute to three scales of knee-related quality of life: physical functioning, activity limitations, and emotional functioning. Item-total correlations ranged from 0.60–0.82. Cronbach's alpha and test retest reliability estimates were 0.91–0.94 and 0.80–0.93 respectively. Hypothesised correlations with the Lysholm Knee Scale, EQ-5D, SF-36 and knee symptom questions were evidence for construct validity. The instrument produced highly significant change scores for 65 trial patients indicating that their knee was a little or somewhat better at six months. The new instrument had higher effect sizes (range 0.86–1.13 and responsiveness statistics (range 1.50–2.13 than the EQ-5D and SF-36. Conclusion The KQoL-26 has good evidence for internal reliability, test-retest reliability, validity and responsiveness, and is recommended for use in randomised trials and other evaluative studies of patients with a suspected ligamentous or meniscal injury.
Garratt, Andrew M; Brealey, Stephen; Robling, Michael; Atwell, Chris; Russell, Ian; Gillespie, William; King, David
This article describes the development and validation of a self-reported questionnaire, the KQoL-26, that is based on the views of patients with a suspected ligamentous or meniscal injury of the knee that assesses the impact of their knee problem on the quality of their lives. Patient interviews and focus groups were used to derive questionnaire content. The instrument was assessed for data quality, reliability, validity, and responsiveness using data from a randomised trial and patient survey about general practitioners' use of Magnetic Resonance Imaging for patients with a suspected ligamentous or meniscal injury. Interview and focus group data produced a 40-item questionnaire designed for self-completion. 559 trial patients and 323 survey patients responded to the questionnaire. Following principal components analysis and Rasch analysis, 26 items were found to contribute to three scales of knee-related quality of life: physical functioning, activity limitations, and emotional functioning. Item-total correlations ranged from 0.60-0.82. Cronbach's alpha and test retest reliability estimates were 0.91-0.94 and 0.80-0.93 respectively. Hypothesised correlations with the Lysholm Knee Scale, EQ-5D, SF-36 and knee symptom questions were evidence for construct validity. The instrument produced highly significant change scores for 65 trial patients indicating that their knee was a little or somewhat better at six months. The new instrument had higher effect sizes (range 0.86-1.13) and responsiveness statistics (range 1.50-2.13) than the EQ-5D and SF-36. The KQoL-26 has good evidence for internal reliability, test-retest reliability, validity and responsiveness, and is recommended for use in randomised trials and other evaluative studies of patients with a suspected ligamentous or meniscal injury.
Tassé, Marc J.; Schalock, Robert L.; Thissen, David; Balboni, Giulia; Bersani, Henry, Jr.; Borthwick-Duffy, Sharon A.; Spreat, Scott; Widaman, Keith F.; Zhang, Dalun; Navas, Patricia
The Diagnostic Adaptive Behavior Scale (DABS) was developed using item response theory (IRT) methods and was constructed to provide the most precise and valid adaptive behavior information at or near the cutoff point of making a decision regarding a diagnosis of intellectual disability. The DABS initial item pool consisted of 260 items. Using IRT…
van der Linden, Willem J.; Scrams, David J.; Schnipke, Deborah L.
This paper proposes an item selection algorithm that can be used to neutralize the effect of time limits in computer adaptive testing. The method is based on a statistical model for the response-time distributions of the test takers on the items in the pool that is updated each time a new item has
Guàrdia-Olmos, J; Carbó-Carreté, M; Peró-Cebollero, M; Giné, C
The study of measurements of quality of life (QoL) is one of the great challenges of modern psychology and psychometric approaches. This issue has greater importance when examining QoL in populations that were historically treated on the basis of their deficiency, and recently, the focus has shifted to what each person values and desires in their life, as in cases of people with intellectual disability (ID). Many studies of QoL scales applied in this area have attempted to improve the validity and reliability of their components by incorporating various sources of information to achieve consistency in the data obtained. The adaptation of the Personal Outcomes Scale (POS) in Spanish has shown excellent psychometric attributes, and its administration has three sources of information: self-assessment, practitioner and family. The study of possible congruence or incongruence of observed distributions of each item between sources is therefore essential to ensure a correct interpretation of the measure. The aim of this paper was to analyse the observed distribution of items and dimensions from the three Spanish POS information sources cited earlier, using the item response theory. We studied a sample of 529 people with ID and their respective practitioners and family member, and in each case, we analysed items and factors using Samejima's model of polytomic ordinal scales. The results indicated an important number of items with differential effects regarding sources, and in some cases, they indicated significant differences in the distribution of items, factors and sources of information. As a result of this analysis, we must affirm that the administration of the POS, considering three sources of information, was adequate overall, but a correct interpretation of the results requires that it obtain much more information to consider, as well as some specific items in specific dimensions. The overall ratings, if these comments are considered, could result in bias. © 2017
Full Text Available To test the applicability of item response theory (IRT to the Korean Nurses' Licensing Examination (KNLE, item analysis was performed after testing the unidimensionality and goodness-of-fit. The results were compared with those based on classical test theory. The results of the 330-item KNLE administered to 12,024 examinees in January 2004 were analyzed. Unidimensionality was tested using DETECT and the goodness-of-fit was tested using WINSTEPS for the Rasch model and Bilog-MG for the two-parameter logistic model. Item analysis and ability estimation were done using WINSTEPS. Using DETECT, Dmax ranged from 0.1 to 0.23 for each subject. The mean square value of the infit and outfit values of all items using WINSTEPS ranged from 0.1 to 1.5, except for one item in pediatric nursing, which scored 1.53. Of the 330 items, 218 (42.7% were misfit using the two-parameter logistic model of Bilog-MG. The correlation coefficients between the difficulty parameter using the Rasch model and the difficulty index from classical test theory ranged from 0.9039 to 0.9699. The correlation between the ability parameter using the Rasch model and the total score from classical test theory ranged from 0.9776 to 0.9984. Therefore, the results of the KNLE fit unidimensionality and goodness-of-fit for the Rasch model. The KNLE should be a good sample for analysis according to the IRT Rasch model, so further research using IRT is possible.
Novakovic, A M; Krekels, E H J; Munafo, A; Ueckert, S; Karlsson, M O
In this study, we report the development of the first item response theory (IRT) model within a pharmacometrics framework to characterize the disease progression in multiple sclerosis (MS), as measured by Expanded Disability Status Score (EDSS). Data were collected quarterly from a 96-week phase III clinical study by a blinder rater, involving 104,206 item-level observations from 1319 patients with relapsing-remitting MS (RRMS), treated with placebo or cladribine. Observed scores for each EDSS item were modeled describing the probability of a given score as a function of patients' (unobserved) disability using a logistic model. Longitudinal data from placebo arms were used to describe the disease progression over time, and the model was then extended to cladribine arms to characterize the drug effect. Sensitivity with respect to patient disability was calculated as Fisher information for each EDSS item, which were ranked according to the amount of information they contained. The IRT model was able to describe baseline and longitudinal EDSS data on item and total level. The final model suggested that cladribine treatment significantly slows disease-progression rate, with a 20% decrease in disease-progression rate compared to placebo, irrespective of exposure, and effects an additional exposure-dependent reduction in disability progression. Four out of eight items contained 80% of information for the given range of disabilities. This study has illustrated that IRT modeling is specifically suitable for accurate quantification of disease status and description and prediction of disease progression in phase 3 studies on RRMS, by integrating EDSS item-level data in a meaningful manner.
Abdin, Edimansyah; Sagayadevan, Vathsala; Vaingankar, Janhavi Ajit; Picco, Louisa; Chong, Siow Ann; Subramaniam, Mythily
The validity of the CAGE using item response theory (IRT) has not yet been examined in older adult population. This study aims to investigate the psychometric properties of the CAGE using both non-parametric and parametric IRT models, assess whether there is any differential item functioning (DIF) by age, gender and ethnicity and examine the measurement precision at the cut-off scores. We used data from the Well-being of the Singapore Elderly study to conduct Mokken scaling analysis (MSA), dichotomous Rasch and 2-parameter logistic IRT models. The measurement precision at the cut-off scores were evaluated using classification accuracy (CA) and classification consistency (CC). The MSA showed the overall scalability H index was 0.459, indicating a medium performing instrument. All items were found to be homogenous, measuring the same construct and able to discriminate well between respondents with high levels of the construct and the ones with lower levels. The item discrimination ranged from 1.07 to 6.73 while the item difficulty ranged from 0.33 to 2.80. Significant DIF was found for 2-item across ethnic group. More than 90% (CC and CA ranged from 92.5% to 94.3%) of the respondents were consistently and accurately classified by the CAGE cut-off scores of 2 and 3. The current study provides new evidence on the validity of the CAGE from the IRT perspective. This study provides valuable information of each item in the assessment of the overall severity of alcohol problem and the precision of the cut-off scores in older adult population.
Teresi, Jeanne A; Ocepek-Welikson, Katja; Lichtenberg, Peter A
The focus of these analyses was to examine the psychometric properties of the Lichtenberg Financial Decision Screening Scale (LFDSS). The purpose of the screen was to evaluate the decisional abilities and vulnerability to exploitation of older adults. Adults aged 60 and over were interviewed by social, legal, financial, or health services professionals who underwent in-person training on the administration and scoring of the scale. Professionals provided a rating of the decision-making abilities of the older adult. The analytic sample included 213 individuals with an average age of 76.9 (SD = 10.1). The majority (57%) were female. Data were analyzed using item response theory (IRT) methodology. The results supported the unidimensionality of the item set. Several IRT models were tested. Ten ordinal and binary items evidenced a slightly higher reliability estimate (0.85) than other versions and better coverage in terms of the range of reliable measurement across the continuum of financial incapacity.
Borges, José Wicto Pereira; Moreira, Thereza Maria Magalhães; Schmitt, Jeovani; Andrade, Dalton Francisco de; Barbetta, Pedro Alberto; Souza, Ana Célia Caetano de; Lima, Daniele Braz da Silva; Carvalho, Irialda Saboia
To analyze the Miniquestionário de Qualidade de Vida em Hipertensão Arterial (MINICHAL - Mini-questionnaire of Quality of Life in Hypertension) using the Item Response Theory. This is an analytical study conducted with 712 persons with hypertension treated in thirteen primary health care units of Fortaleza, State of Ceará, Brazil, in 2015. The steps of the analysis by the Item Response Theory were: evaluation of dimensionality, estimation of parameters of items, and construction of scale. The study of dimensionality was carried out on the polychoric correlation matrix and confirmatory factor analysis. To estimate the item parameters, we used the Gradual Response Model of Samejima. The analyses were conducted using the free software R with the aid of psych and mirt. The analysis has allowed the visualization of item parameters and their individual contributions in the measurement of the latent trait, generating more information and allowing the construction of a scale with an interpretative model that demonstrates the evolution of the worsening of the quality of life in five levels. Regarding the item parameters, the items related to the somatic state have had a good performance, as they have presented better power to discriminate individuals with worse quality of life. The items related to mental state have been those which contributed with less psychometric data in the MINICHAL. We conclude that the instrument is suitable for the identification of the worsening of the quality of life in hypertension. The analysis of the MINICHAL using the Item Response Theory has allowed us to identify new sides of this instrument that have not yet been addressed in previous studies. Analisar o Miniquestionário de Qualidade de Vida em Hipertensão Arterial (MINICHAL) por meio da Teoria da Resposta ao Item. Estudo analítico realizado com 712 pessoas com hipertensão arterial atendidas em 13 unidades de atenção primária em saúde de Fortaleza, CE, em 2015. As etapas da an
Levine, Stephen Z; Rabinowitz, Jonathan; Rizopoulos, Dimitris
The adequacy of the Positive and Negative Syndrome Scale (PANSS) items in measuring symptom severity in schizophrenia was examined using Item Response Theory (IRT). Baseline PANSS assessments were analyzed from two multi-center clinical trials of antipsychotic medication in chronic schizophrenia (n=1872). Generally, the results showed that the PANSS (a) item ratings discriminated symptom severity best for the negative symptoms; (b) has an excess of "Severe" and "Extremely severe" rating options; and (c) assessments are more reliable at medium than very low or high levels of symptom severity. Analysis also showed that the detection of statistically and non-statistically significant differences in treatment were highly similar for the original and IRT-modified PANSS. In clinical trials of chronic schizophrenia, the PANSS appears to require the following modifications: fewer rating options, adjustment of 'Lack of judgment and insight', and improved severe symptom assessment. 2011 Elsevier Ltd. All rights reserved.
Galindo-Garre, Francisca; Hidalgo, María Dolores; Guilera, Georgina; Pino, Oscar; Rojo, J Emilio; Gómez-Benito, Juana
The World Health Organization Disability Assessment Schedule II (WHO-DAS II) is a multidimensional instrument developed for measuring disability. It comprises six domains (getting around, self-care, getting along with others, life activities and participation in society). The main purpose of this paper is the evaluation of the psychometric properties for each domain of the WHO-DAS II with parametric and non-parametric Item Response Theory (IRT) models. A secondary objective is to assess whether the WHO-DAS II items within each domain form a hierarchy of invariantly ordered severity indicators of disability. A sample of 352 patients with a schizophrenia spectrum disorder is used in this study. The 36 items WHO-DAS II was administered during the consultation. Partial Credit and Mokken scale models are used to study the psychometric properties of the questionnaire. The psychometric properties of the WHO-DAS II scale are satisfactory for all the domains. However, we identify a few items that do not discriminate satisfactorily between different levels of disability and cannot be invariantly ordered in the scale. In conclusion the WHO-DAS II can be used to assess overall disability in patients with schizophrenia, but some domains are too general to assess functionality in these patients because they contain items that are not applicable to this pathology. Copyright © 2014 John Wiley & Sons, Ltd.
Grigg, Kaine; Manderson, Lenore
Racism and associated discrimination are pervasive and persistent challenges with multiple cumulative deleterious effects contributing to inequities in various health outcomes. Globally, research over the past decade has shown consistent associations between racism and negative health concerns. Such research confirms that race endures as one of the strongest predictors of poor health. Due to the lack of validated Australian measures of racist attitudes, RACES (Racism, Acceptance, and Cultural-Ethnocentrism Scale) was developed. Here, we examine RACES' psychometric properties, including the latent structure, utilising Item Response Theory (IRT). Unidimensional and Multidimensional Rating Scale Model (RSM) Rasch analyses were utilised with 296 Victorian primary school students and 182 adolescents and 220 adults from the Australian community. RACES was demonstrated to be a robust 24-item three-dimensional scale of Accepting Attitudes (12 items), Racist Attitudes (8 items), and Ethnocentric Attitudes (4 items). RSM Rasch analyses provide strong support for the instrument as a robust measure of racist attitudes in the Australian context, and for the overall factorial and construct validity of RACES across primary school children, adolescents, and adults. RACES provides a reliable and valid measure that can be utilised across the lifespan to evaluate attitudes towards all racial, ethnic, cultural, and religious groups. A core function of RACES is to assess the effectiveness of interventions to reduce community levels of racism and in turn inequities in health outcomes within Australia.
Full Text Available The current state of research does not tell us much about citizens’ expectations of political decision making. Most surveys allow respondents to evaluate how the current system is working, but do not inquire about alternative political decision-making procedures. The lack of established survey items can be explained by the fact that radical changes in decision-making procedures have been hard to envisage, but also by a general scepticism regarding people’s ability to form opinions on these matters. Political processes are, without doubt, complex matters that do not lend themselves very well to simplistic survey questions. Moreover, previous research has convincingly shown that most people in general have difficulties forming single, coherent and stable attitudes even towards far more straightforward political issues. In order to determine if trying to grasp attitudes towards political decision-making in future empirical studies can be considered a fruitful endeavour, this study sets out to critically assess the extent to which people express coherent preferences on these matters, and if preferences are in line with expectations in previous, rather scattered research. The study is based on the Finnish National Election Study 2011; a study which, contrary to most other election studies, includes a rich variety of survey items on the topic, and utilises a combination of strategies in order to explore patterns in the opinions held by citizens.
El estado actual de las investigaciones no nos dice mucho sobre las expectativas de los ciudadanos con respecto a la toma de decisiones políticas. La mayoría de las encuestas permiten que quienes las responden evalúen cómo funciona el sistema actual, pero no preguntan por procedimientos alternativos de decisión política. La falta de preguntas de encuesta contrastadas se puede explicar tanto por el hecho de que los cambios en los procedimientos de toma de decisiones han resultado difíciles de
Bogen, Karen; Biener, Lois; Garrett, Catherine A; Allen, Jane; Cummings, K Michael; Hartman, Anne; Marcus, Stephen; McNeill, Ann; O'Connor, Richard J; Parascandola, Mark; Pederson, Linda
Background Over the past decade, tobacco companies have introduced cigarettes and smokeless tobacco products (known as Potential Reduced Exposure Products, PREPs) with purportedly lower levels of some toxins than conventional cigarettes and smokeless products. It is essential that public health agencies monitor awareness, interest, use, and perceptions of these products so that their impact on population health can be detected at the earliest stages. Methods This paper reviews and critiques existing strategies for measuring awareness of PREPs from 16 published and unpublished studies. From these measures, we developed new surveillance items and subjected them to two rounds of cognitive testing, a common and accepted method for evaluating questionnaire wording. Results Our review suggests that high levels of awareness of PREPs reported in some studies are likely to be inaccurate. Two likely sources of inaccuracy in awareness measures were identified: 1) the tendency of respondents to misclassify "no additive" and "natural" cigarettes as PREPs and 2) the tendency of respondents to mistakenly report awareness as a result of confusion between PREPs brands and similarly named familiar products, for example, Eclipse chewing gum and Accord automobiles. Conclusion After evaluating new measures with cognitive interviews, we conclude that as of winter 2006, awareness of reduced exposure products among U.S. smokers was likely to be between 1% and 8%, with the higher estimates for some products occurring in test markets. Recommended measurement strategies for future surveys are presented. PMID:19840394
Full Text Available Abstract Background Over the past decade, tobacco companies have introduced cigarettes and smokeless tobacco products (known as Potential Reduced Exposure Products, PREPs with purportedly lower levels of some toxins than conventional cigarettes and smokeless products. It is essential that public health agencies monitor awareness, interest, use, and perceptions of these products so that their impact on population health can be detected at the earliest stages. Methods This paper reviews and critiques existing strategies for measuring awareness of PREPs from 16 published and unpublished studies. From these measures, we developed new surveillance items and subjected them to two rounds of cognitive testing, a common and accepted method for evaluating questionnaire wording. Results Our review suggests that high levels of awareness of PREPs reported in some studies are likely to be inaccurate. Two likely sources of inaccuracy in awareness measures were identified: 1 the tendency of respondents to misclassify "no additive" and "natural" cigarettes as PREPs and 2 the tendency of respondents to mistakenly report awareness as a result of confusion between PREPs brands and similarly named familiar products, for example, Eclipse chewing gum and Accord automobiles. Conclusion After evaluating new measures with cognitive interviews, we conclude that as of winter 2006, awareness of reduced exposure products among U.S. smokers was likely to be between 1% and 8%, with the higher estimates for some products occurring in test markets. Recommended measurement strategies for future surveys are presented.
Full Text Available To develop a tool for use in hearing screening and to evaluate the patient journey towards hearing rehabilitation, responses to the hearing aid rehabilitation questionnaire scales aid stigma, pressure, and aid unwanted addressing respectively hearing aid stigma, experienced pressure from others; perceived hearing aid benefit were evaluated with item response theory. The sample was comprised of 212 persons aged 55 years or more; 63 were hearing aid users, 64 with and 85 persons without hearing impairment according to guidelines for hearing aid reimbursement in the Netherlands. Bias was investigated relative to hearing aid use and hearing impairment within the differential test functioning framework. Items compromising model fit or demonstrating differential item functioning were dropped. The aid stigma scale was reduced from 6 to 4, the pressure scale from 7 to 4, and the aid unwanted scale from 5 to 4 items. This procedure resulted in bias-free scales ready for screening purposes and application to further understand the help-seeking process of the hearing impaired.
Gu, Chenyang; Gutman, Roee
The assessment of patients' functional status across the continuum of care requires a common patient assessment tool. However, assessment tools that are used in various health care settings differ and cannot be easily contrasted. For example, the Functional Independence Measure (FIM) is used to evaluate the functional status of patients who stay in inpatient rehabilitation facilities, the Minimum Data Set (MDS) is collected for all patients who stay in skilled nursing facilities, and the Outcome and Assessment Information Set (OASIS) is collected if they choose home health care provided by home health agencies. All three instruments or questionnaires include functional status items, but the specific items, rating scales, and instructions for scoring different activities vary between the different settings. We consider equating different health assessment questionnaires as a missing data problem, and propose a variant of predictive mean matching method that relies on Item Response Theory (IRT) models to impute unmeasured item responses. Using real data sets, we simulated missing measurements and compared our proposed approach to existing methods for missing data imputation. We show that, for all of the estimands considered, and in most of the experimental conditions that were examined, the proposed approach provides valid inferences, and generally has better coverages, relatively smaller biases, and shorter interval estimates. The proposed method is further illustrated using a real data set. © 2016, The International Biometric Society.
Chenault, Michelene; Berger, Martijn; Kremer, Bernd; Anteunis, Lucien
To develop a tool for use in hearing screening and to evaluate the patient journey towards hearing rehabilitation, responses to the hearing aid rehabilitation questionnaire scales aid stigma, pressure, and aid unwanted addressing respectively hearing aid stigma, experienced pressure from others; perceived hearing aid benefit were evaluated with item response theory. The sample was comprised of 212 persons aged 55 years or more; 63 were hearing aid users, 64 with and 85 persons without hearing impairment according to guidelines for hearing aid reimbursement in the Netherlands. Bias was investigated relative to hearing aid use and hearing impairment within the differential test functioning framework. Items compromising model fit or demonstrating differential item functioning were dropped. The aid stigma scale was reduced from 6 to 4, the pressure scale from 7 to 4, and the aid unwanted scale from 5 to 4 items. This procedure resulted in bias-free scales ready for screening purposes and application to further understand the help-seeking process of the hearing impaired. PMID:28028428
Krekels, Ehj; Novakovic, A M; Vermeulen, A M; Friberg, L E; Karlsson, M O
As biomarkers are lacking, multi-item questionnaire-based tools like the Positive and Negative Syndrome Scale (PANSS) are used to quantify disease severity in schizophrenia. Analyzing composite PANSS scores as continuous data discards information and violates the numerical nature of the scale. Here a longitudinal analysis based on Item Response Theory is presented using PANSS data from phase III clinical trials. Latent disease severity variables were derived from item-level data on the positive, negative, and general PANSS subscales each. On all subscales, the time course of placebo responses were best described with Weibull models, and dose-independent functions with exponential models to describe the onset of the full effect were used to describe paliperidone's effect. Placebo and drug effect were most pronounced on the positive subscale. The final model successfully describes the time course of treatment effects on the individual PANSS item-levels, on all PANSS subscale levels, and on the total score level. © 2017 The Authors CPT: Pharmacometrics & Systems Pharmacology published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.
Hoertel, Nicolas; Peyre, Hugo; Lavaud, Pierre; Blanco, Carlos; Guerin-Langlois, Christophe; René, Margaux; Schuster, Jean-Pierre; Lemogne, Cédric; Delorme, Richard; Limosin, Frédéric
The limited published literature on the subject suggests that there may be differences in how females and males experience narcissistic personality disorder (NPD) symptoms. The aim of this study was to use methods based on item response theory to examine whether, when equating for levels of NPD symptom severity, there are sex differences in the likelihood of reporting DSM-IV-TR NPD symptoms. We conducted these analyses using a large, nationally representative sample from the USA (n=34,653), the second wave of the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC). There were statistically and clinically significant sex differences for 2 out of the 9 DSM-IV-TR NPD symptoms. We found that males were more likely to endorse the item 'lack of empathy' at lower levels of narcissistic personality disorder severity than females. The item 'being envious' was a better indicator of NPD severity in males than in females. There were no clinically significant sex differences on the remaining NPD symptoms. Overall, our findings indicate substantial sex differences in narcissistic personality disorder symptom expression. Although our results may reflect sex-bias in diagnostic criteria, they are consistent with recent views suggesting that narcissistic personality disorder may be underpinned by shared and sex-specific mechanisms. Copyright © 2017 Elsevier B.V. All rights reserved.
Leventhal, Brian C.; Stone, Clement A.
Interest in Bayesian analysis of item response theory (IRT) models has grown tremendously due to the appeal of the paradigm among psychometricians, advantages of these methods when analyzing complex models, and availability of general-purpose software. Possible models include models which reflect multidimensionality due to designed test structure,…
Full Text Available Abstract Background Many continuous item responses (CIRs are encountered in healthcare settings, but no one uses item response theory’s (IRT probabilistic modeling to present graphical presentations for interpreting CIR results. A computer module that is programmed to deal with CIRs is required. To present a computer module, validate it, and verify its usefulness in dealing with CIR data, and then to apply the model to real healthcare data in order to show how the CIR that can be applied to healthcare settings with an example regarding a safety attitude survey. Methods Using Microsoft Excel VBA (Visual Basic for Applications, we designed a computer module that minimizes the residuals and calculates model’s expected scores according to person responses across items. Rasch models based on a Wright map and on KIDMAP were demonstrated to interpret results of the safety attitude survey. Results The author-made CIR module yielded OUTFIT mean square (MNSQ and person measures equivalent to those yielded by professional Rasch Winsteps software. The probabilistic modeling of the CIR module provides messages that are much more valuable to users and show the CIR advantage over classic test theory. Conclusions Because of advances in computer technology, healthcare users who are familiar to MS Excel can easily apply the study CIR module to deal with continuous variables to benefit comparisons of data with a logistic distribution and model fit statistics.
Chien, Tsair-Wei; Shao, Yang; Kuo, Shu-Chun
Many continuous item responses (CIRs) are encountered in healthcare settings, but no one uses item response theory's (IRT) probabilistic modeling to present graphical presentations for interpreting CIR results. A computer module that is programmed to deal with CIRs is required. To present a computer module, validate it, and verify its usefulness in dealing with CIR data, and then to apply the model to real healthcare data in order to show how the CIR that can be applied to healthcare settings with an example regarding a safety attitude survey. Using Microsoft Excel VBA (Visual Basic for Applications), we designed a computer module that minimizes the residuals and calculates model's expected scores according to person responses across items. Rasch models based on a Wright map and on KIDMAP were demonstrated to interpret results of the safety attitude survey. The author-made CIR module yielded OUTFIT mean square (MNSQ) and person measures equivalent to those yielded by professional Rasch Winsteps software. The probabilistic modeling of the CIR module provides messages that are much more valuable to users and show the CIR advantage over classic test theory. Because of advances in computer technology, healthcare users who are familiar to MS Excel can easily apply the study CIR module to deal with continuous variables to benefit comparisons of data with a logistic distribution and model fit statistics.
Wattanakasiwich, P.; Ananta, S.
In physics education research, the main goal is to improve physics teaching so that most students understand physics conceptually and be able to apply concepts in solving problems. Therefore many multiple-choice instruments were developed to probe students' conceptual understanding in various topics. Two techniques including model analysis and item response curves were used to analyze students' responses from Force and Motion Conceptual Evaluation (FMCE). For this study FMCE data from more than 1000 students at Chiang Mai University were collected over the past three years. With model analysis, we can obtain students' alternative knowledge and the probabilities for students to use such knowledge in a range of equivalent contexts. The model analysis consists of two algorithms—concentration factor and model estimation. This paper only presents results from using the model estimation algorithm to obtain a model plot. The plot helps to identify a class model state whether it is in the misconception region or not. Item response curve (IRC) derived from item response theory is a plot between percentages of students selecting a particular choice versus their total score. Pros and cons of both techniques are compared and discussed.
Liegl, Gregor; Wahl, Inka; Berghöfer, Anne; Nolte, Sandra; Pieh, Christoph; Rose, Matthias; Fischer, Felix
To investigate the validity of a common depression metric in independent samples. We applied a common metrics approach based on item-response theory for measuring depression to four German-speaking samples that completed the Patient Health Questionnaire (PHQ-9). We compared the PHQ item parameters reported for this common metric to reestimated item parameters that derived from fitting a generalized partial credit model solely to the PHQ-9 items. We calibrated the new model on the same scale as the common metric using two approaches (estimation with shifted prior and Stocking-Lord linking). By fitting a mixed-effects model and using Bland-Altman plots, we investigated the agreement between latent depression scores resulting from the different estimation models. We found different item parameters across samples and estimation methods. Although differences in latent depression scores between different estimation methods were statistically significant, these were clinically irrelevant. Our findings provide evidence that it is possible to estimate latent depression scores by using the item parameters from a common metric instead of reestimating and linking a model. The use of common metric parameters is simple, for example, using a Web application (http://www.common-metrics.org) and offers a long-term perspective to improve the comparability of patient-reported outcome measures. Copyright © 2016 Elsevier Inc. All rights reserved.
Cappelleri, Joseph C; Jason Lundy, J; Hays, Ron D
The US Food and Drug Administration's guidance for industry document on patient-reported outcomes (PRO) defines content validity as "the extent to which the instrument measures the concept of interest" (FDA, 2009, p. 12). According to Strauss and Smith (2009), construct validity "is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity" (p. 7). Hence, both qualitative and quantitative information are essential in evaluating the validity of measures. We review classical test theory and item response theory (IRT) approaches to evaluating PRO measures, including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized "difficulty" (severity) order of items is represented by observed responses. If a researcher has few qualitative data and wants to get preliminary information about the content validity of the instrument, then descriptive assessments using classical test theory should be the first step. As the sample size grows during subsequent stages of instrument development, confidence in the numerical estimates from Rasch and other IRT models (as well as those of classical test theory) would also grow. Classical test theory and IRT can be useful in providing a quantitative assessment of items and scales during the content-validity phase of PRO-measure development. Depending on the particular type of measure and the specific circumstances, the classical test theory and/or the IRT should be considered to help maximize the content validity of PRO measures. Copyright © 2014 Elsevier HS Journals, Inc. All rights reserved.
Suzanne M. Joy; Richard T. Reynolds; Douglas G. Leslie
We examined responses of Northern Goshawks (Accipter gentilis) to taped broadcast calls of conspecifics in tree-harvest areas and around alternate goshawk nests on Kaibab National Forest, Arizona, in 1991 and 1992. Forest areas totaling 476 km2 were systematically surveyed for goshawks. Ninety responses by adult and juvenile goshawks were elicited...
Wetzel, Eunike; Hell, Benedikt
Vocational interest inventories are commonly analyzed using a unidimensional approach, that is, each subscale is analyzed separately. However, the theories on which these inventories are based often postulate specific relationships between the interest traits. This article presents a multidimensional approach to the analysis of vocational interest data, which takes these relationships into account. Models in the framework of Multidimensional Item Response Theory (MIRT) are explained and appli...
Meijer, R.R.; de Vries, Rivka M.; van Bruggen, Vincent
The psychometric structure of the Brief Symptom Inventory–18 (BSI-18; Derogatis, 2001) was investigated using Mokken scaling and parametric item response theory. Data of 487 outpatients, 266 students, and 207 prisoners were analyzed. Results of the Mokken analysis indicated that the BSI-18 formed a
Meijer, Rob R.; de Vries, Rivka M.; van Bruggen, Vincent
The psychometric structure of the Brief Symptom Inventory-18 (BSI-18; Derogatis, 2001) was investigated using Mokken scaling and parametric item response theory. Data of 487 outpatients, 266 students, and 207 prisoners were analyzed. Results of the Mokken analysis indicated that the BSI-18 formed a strong Mokken scale for outpatients and…
Meijer, Rob R.; de Vries, Rivka M.; van Bruggen, Vincent
The psychometric structure of the Brief Symptom Inventory-18 (BSI-18; Derogatis, 2001) was investigated using Mokken scaling and parametric item response theory. Data of 487 outpatients, 266 students, and 207 prisoners were analyzed. Results of the Mokken analysis indicated that the BSI-18 formed a
Meijer, R.R.; Egberink, I.J.L.; Emons, Wilco H.M.; Sijtsma, Klaas
We illustrate the usefulness of person-fit methodology for personality assessment. For this purpose, we use person-fit methods from item response theory. First, we give a nontechnical introduction to existing person-fit statistics. Second, we analyze data from Harter's (1985)Self-Perception Profile
Ranger, Jochen; Kuhn, Jörg-Tobias; Szardenings, Carsten
Psychological tests are usually analysed with item response models. Recently, some alternative measurement models have been proposed that were derived from cognitive process models developed in experimental psychology. These models consider the responses but also the response times of the test takers. Two such models are the Q-diffusion model and the D-diffusion model. Both models can be calibrated with the diffIRT package of the R statistical environment via marginal maximum likelihood (MML) estimation. In this manuscript, an alternative approach to model calibration is proposed. The approach is based on weighted least squares estimation and parallels the standard estimation approach in structural equation modelling. Estimates are determined by minimizing the discrepancy between the observed and the implied covariance matrix. The estimator is simple to implement, consistent, and asymptotically normally distributed. Least squares estimation also provides a test of model fit by comparing the observed and implied covariance matrix. The estimator and the test of model fit are evaluated in a simulation study. Although parameter recovery is good, the estimator is less efficient than the MML estimator. © 2016 The British Psychological Society.
Jeschonek, Susanna; Marinovic, Vesna; Hoehl, Stefanie; Elsner, Birgit; Pauen, Sabina
One of the earliest categorical distinctions to be made by preverbal infants is the animate-inanimate distinction. To explore the neural basis for this distinction in 7-8-month-olds, an equal number of animal and furniture pictures was presented in an ERP-paradigm. The total of 118 pictures, all looking different from each other, were presented in a semi-randomized order for 1000ms each. Infants' brain responses to exemplars from both categories differed systematically regarding the negative central component (Nc: 400-600ms) at anterior channels. More specifically, the Nc was enhanced for animals in one subgroup of infants, and for furniture items in another subgroup of infants. Explorative analyses related to categorical priming further revealed category-specific differences in brain responses in the late time window (650-1550ms) at right frontal channels: Unprimed stimuli (preceded by a different-category item) elicited a more positive response as compared to primed stimuli (preceded by a same-category item). In sum, these findings suggest that the infant's brain discriminates exemplars from both global domains. Given the design of our task, we conclude that processes of category identification are more likely to account for our findings than processes of on-line category formation during the experimental session. Copyright © 2009 Elsevier B.V. All rights reserved.
Thompson, Bruce; Cook, Colleen; Kyrillidou, Martha
The LibQUAL+[TM] protocol solicits open-ended comments from users with regard to library service quality, gathers data on 22 core items, and, at the option of individual libraries, also garners ratings on five items drawn from a pool of more than 100 choices selected by libraries. In this article, the relationship of scores on these locally…
DiStefano, Christine; Motl, Robert W.
This article used multitrait-multimethod methodology and covariance modeling for an investigation of the presence and correlates of method effects associated with negatively worded items on the Rosenberg Self-Esteem (RSE) scale (Rosenberg, 1989) using a sample of 757 adults. Results showed that method effects associated with negative item phrasing…
Reise, Steven P.; Meijer, Rob R.; Ainsworth, Andrew T.; Morales, Leo S.; Hays, Ron D.
Group-level parametric and non-parametric item response theory models were applied to the Consumer Assessment of Healthcare Providers and Systems (CAHPS[R]) 2.0 core items in a sample of 35,572 Medicaid recipients nested within 131 health plans. Results indicated that CAHPS responses are dominated by within health plan variation, and only weakly…
In latent trait theory the latent space, or space of the hypothetical construct, is usually represented by some unidimensional or multi-dimensional continuum of real numbers. Like the latent space, the item response can either be treated as a discrete variable or as a continuous variable. Latent trait theory relates the item response to the latent…
Raykov, Tenko; Marcoulides, George A.
The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete…
The multidimensional item response theory (MIRT) models with covariates proposed by Haberman and implemented in the "mirt" program provide a flexible way to analyze data based on item response theory. In this report, we discuss applications of the MIRT models with covariates to longitudinal test data to measure skill differences at the…
Gadermann, Anne M.; Guhn, Martin; Zumbo, Bruno D.
This paper provides a conceptual, empirical, and practical guide for estimating ordinal reliability coefficients for ordinal item response data (also referred to as Likert, Likert-type, ordered categorical, or rating scale item responses). Conventionally, reliability coefficients, such as Cronbach's alpha, are calculated using a Pearson…
Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Ying, Zhiliang
Item response theory (IRT) plays an important role in psychological and educational measurement. Unlike the classical testing theory, IRT models aggregate the item level information, yielding more accurate measurements. Most IRT models assume local independence, an assumption not likely to be satisfied in practice, especially when the number of items is large. Results in the literature and simulation studies in this paper reveal that misspecifying the local independence assumption may result in inaccurate measurements and differential item functioning. To provide more robust measurements, we propose an integrated approach by adding a graphical component to a multidimensional IRT model that can offset the effect of unknown local dependence. The new model contains a confirmatory latent variable component, which measures the targeted latent traits, and a graphical component, which captures the local dependence. An efficient proximal algorithm is proposed for the parameter estimation and structure learning of the local dependence. This approach can substantially improve the measurement, given no prior information on the local dependence structure. The model can be applied to measure both a unidimensional latent trait and multidimensional latent traits.
Moore, Mariah; Gordon, Peter C
In the author recognition test (ART), participants are presented with a series of names and foils and are asked to indicate which ones they recognize as authors. The test is a strong predictor of reading skill, and this predictive ability is generally explained as occurring because author knowledge is likely acquired through reading or other forms of print exposure. In this large-scale study (1,012 college student participants), we used item response theory (IRT) to analyze item (author) characteristics in order to facilitate identification of the determinants of item difficulty, provide a basis for further test development, and optimize scoring of the ART. Factor analysis suggested a potential two-factor structure of the ART, differentiating between literary and popular authors. Effective and ineffective author names were identified so as to facilitate future revisions of the ART. Analyses showed that the ART is a highly significant predictor of the time spent encoding words, as measured using eyetracking during reading. The relationship between the ART and time spent reading provided a basis for implementing a higher penalty for selecting foils, rather than the standard method of ART scoring (names selected minus foils selected). The findings provide novel support for the view that the ART is a valid indicator of reading volume. Furthermore, they show that frequency data can be used to select items of appropriate difficulty, and that frequency data from corpora based on particular time periods and types of texts may allow adaptations of the test for different populations.
Ueckert, Sebastian; Plan, Elodie L; Ito, Kaori; Karlsson, Mats O; Corrigan, Brian; Hooker, Andrew C
This work investigates improved utilization of ADAS-cog data (the primary outcome in Alzheimer's disease (AD) trials of mild and moderate AD) by combining pharmacometric modeling and item response theory (IRT). A baseline IRT model characterizing the ADAS-cog was built based on data from 2,744 individuals. Pharmacometric methods were used to extend the baseline IRT model to describe longitudinal ADAS-cog scores from an 18-month clinical study with 322 patients. Sensitivity of the ADAS-cog items in different patient populations as well as the power to detect a drug effect in relation to total score based methods were assessed with the IRT based model. IRT analysis was able to describe both total and item level baseline ADAS-cog data. Longitudinal data were also well described. Differences in the information content of the item level components could be quantitatively characterized and ranked for mild cognitively impairment and mild AD populations. Based on clinical trial simulations with a theoretical drug effect, the IRT method demonstrated a significantly higher power to detect drug effect compared to the traditional method of analysis. A combined framework of IRT and pharmacometric modeling permits a more effective and precise analysis than total score based methods and therefore increases the value of ADAS-cog data.
Hwang, M. J.; Kim, K. Y.; Yang, Z. A.
In this paper, we surveyed the cost and benefit items and evaluated the costs against performing the Maintenance Rule. In the past, only one electric power company had provided the electricity without free competition in Korea. In these days, however, the electric power company was divided into two parts by the sources: atomic and hydraulic generation and thermal-power generation. Therefore, the generation sources that done have competitiveness at the price will be weeded out in the electric power market. Although the preferential goal is on the safe operation at the Nuclear power Plants (NPPs), if too much money is required to maintain or improve the safety of the NPP, the licensee could hesitate to adopt the program related to the safety even though it is a good one. Since the Risk-Informed Applications (RIA) have been using for a plant operation in recent, the condition of a plant might be changed. Therefore, considering the affects of the RIA, a method to keep the capability through the monitoring the maintenance effectiveness has been proposed. However, to perform this, a number of works, continuous collecting data and monitoring the maintenance effectiveness and understanding the reason of degrading capability, should be preceded. Therefore, a lot of man-hour is needed to develop and to manage the application method, and the licensee should pay the costs. Therefore, in the domestic circumstance, it is necessary to evaluate the cost to monitor the maintenance effectiveness. Hence, we are going to examine the cost to perform the MR and its anticipated benefit lists
Shahidi, Gholam Ali; Ghaempanah, Zeinab; Khalili, Yasaman; Nojomi, Marzieh
Assessment of quality-of-life (QOF) as an outcome measure after deep brain stimulation (DBS) surgery in patients with Parkinson's disease (PD) need a valid, reliable and responsive instrument. The aim of the current study was to determine responsiveness of validated Persian version of PD questionnaire with 39-items (PDQ-39) after DBS surgery in patients with PD. Eleven patients with PD, who were candidate for DBS operation between May 2012 and June 2013 were assessed. PDQ-39 and short-form questionnaire with 36-items (SF-36) were used. To assess responsiveness of PDQ-39 standardized response mean (SRM) was used. Mean age was 51.8 (8.8) and all of the patients, but just one were male (10 patients). Mean duration of the disease was 8.7 (2.1) years. Eight patients were categorized as moderate using Hoehn and Yahr (H and Y) classification. All patients had a better H and Y score compared with the baseline evaluation (3.09 vs. 0.79). The amount of SRM was above 0.70 for all domains means a large responsiveness for PDQ-39. Persian version of PDQ-39 has an acceptable responsiveness and could be used to assess as an outcome measure to evaluate the effect of therapies on PD.
Wang, Chun; Chang, Hua-Hua; Douglas, Jeffrey A
The item response times (RTs) collected from computerized testing represent an underutilized source of information about items and examinees. In addition to knowing the examinees' responses to each item, we can investigate the amount of time examinees spend on each item. In this paper, we propose a semi-parametric model for RTs, the linear transformation model with a latent speed covariate, which combines the flexibility of non-parametric modelling and the brevity as well as interpretability of parametric modelling. In this new model, the RTs, after some non-parametric monotone transformation, become a linear model with latent speed as covariate plus an error term. The distribution of the error term implicitly defines the relationship between the RT and examinees' latent speeds; whereas the non-parametric transformation is able to describe various shapes of RT distributions. The linear transformation model represents a rich family of models that includes the Cox proportional hazards model, the Box-Cox normal model, and many other models as special cases. This new model is embedded in a hierarchical framework so that both RTs and responses are modelled simultaneously. A two-stage estimation method is proposed. In the first stage, the Markov chain Monte Carlo method is employed to estimate the parametric part of the model. In the second stage, an estimating equation method with a recursive algorithm is adopted to estimate the non-parametric transformation. Applicability of the new model is demonstrated with a simulation study and a real data application. Finally, methods to evaluate the model fit are suggested. © 2012 The British Psychological Society.
Christensen, Anne Illemann; Ekholm, Ola; Glümer, Charlotte
.7%). Marital status, ethnic background and highest completed education were associated with non-response in both modes. Furthermore, sex and age were associated with non-response in the self-administered mode. No significant mode effects were observed for indicators related to use of health services......BACKGROUND: While face-to-face interviews are considered the gold standard of survey modes, self-administered questionnaires are often preferred for cost and convenience. This article examines response patterns in two general population health surveys carried out by face-to-face interview and self......-administered questionnaire, respectively. METHOD: Data derives from a health interview survey in the Region of Southern Denmark (face-to-face interview) and The Danish Health and Morbidity Survey 2010 (self-administered questionnaire). Identical questions were used in both surveys. Data on all individuals were obtained from...
Martin Hernani Merino
Full Text Available There has been growing recognition of the importance of creating performance measurement tools for the economic, social and environmental management of micro and small enterprise (MSE. In this context, this study aims to validate an instrument to assess perceptions of sustainable development practices by MSEs by means of a Graded Response Model (GRM with a Bayesian approach to Item Response Theory (IRT. The results based on a sample of 506 university students in Peru, suggest that a valid measurement instrument was achieved. At the end of the paper, methodological and managerial contributions are presented.
Full Text Available The R package ltm has been developed for the analysis of multivariate dichotomous and polytomous data using latent variable models, under the Item Response Theory approach. For dichotomous data the Rasch, the Two-Parameter Logistic, and Birnbaum's Three-Parameter models have been implemented, whereas for polytomous data Semejima's Graded Response model is available. Parameter estimates are obtained under marginal maximum likelihood using the Gauss-Hermite quadrature rule. The capabilities and features of the package are illustrated using two real data examples.
Schlingman, Wayne M.; Prather, E. E.; Collaboration of Astronomy Teaching Scholars CATS
Analyzing the data from the recent national study using the Light and Spectroscopy Concept Inventory (LSCI), this project uses both Classical Test Theory (CTT) and Item Response Theory (IRT) to investigate the LSCI itself in order to better understand what it is actually measuring. We use Classical Test Theory to form a framework of results that can be used to evaluate the effectiveness of individual questions at measuring differences in student understanding and provide further insight into the prior results presented from this data set. In the second phase of this research, we use Item Response Theory to form a theoretical model that generates parameters accounting for a student's ability, a question's difficulty, and estimate the level of guessing. The combined results from our investigations using both CTT and IRT are used to better understand the learning that is taking place in classrooms across the country. The analysis will also allow us to evaluate the effectiveness of individual questions and determine whether the item difficulties are appropriately matched to the abilities of the students in our data set. These results may require that some questions be revised, motivating the need for further development of the LSCI. This material is based upon work supported by the National Science Foundation under Grant No. 0715517, a CCLI Phase III Grant for the Collaboration of Astronomy Teaching Scholars (CATS). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Conclusion: Item response models are useful for medical test analyses and provide valuable information about model comparisons and identification of differential items other than test reliability, item difficulty, and examinee's ability.
Full Text Available Abstract Background The purpose of the study was to determine whether the Persian version of the KIDSCREEN-27 has the optimal number of response category to measure health-related quality of life (HRQoL in children and adolescents. Moreover, we aimed to determine if all the items contributed adequately to their own domain. Findings The Persian version of the KIDSCREEN-27 was completed by 1083 school children and 1070 of their parents. The Rasch partial credit model (PCM was used to investigate item statistics and ordering of response categories. The PCM showed that no item was misfitting. The PCM also revealed that, successive response categories for all items were located in the expected order except for category 1 in self- and proxy-reports. Conclusions Although Rasch analysis confirms that all the items belong to their own underlying construct, response categories should be reorganized and evaluated in further studies, especially in children with chronic conditions.
de Sá Junior, Antonio Reis; de Andrade, Arthur Guerra; Andrade, Laura Helena; Gorenstein, Clarice; Wang, Yuan-Pang
This study examines the response pattern of depressive symptoms in a nationwide student sample, through item analyses of a rating scale by both classical test theory (CTT) and item response theory (IRT). The 21-item Beck Depression Inventory-II (BDI-II) was administered to 12,711 college students. First, the psychometric properties of the scale were described. Thereafter, the endorsement probability of depressive symptom in each scale item was analyzed through CTT and IRT. Graphical plots depicted the endorsement probability of scale items and intensity of depression. Three items of different difficulty level were compared through CTT and IRT approach. Four in five students reported the presence of depressive symptoms. The BDI-II items presented good reliability and were distributed along the symptomatic continuum of depression. Similarly, in both CTT and IRT approaches, the item 'changes in sleep' was easily endorsed, 'loss of interest' moderately and 'suicidal thoughts' hardly. Graphical representation of BDI-II of both methods showed much equivalence in terms of item discrimination and item difficulty. The item characteristic curve of the IRT method provided informative evaluation of item performance. The inventory was applied only in college students. Depressive symptoms were frequent psychopathological manifestations among college students. The performance of the BDI-II items indicated convergent results from both methods of analysis. While the CTT was easy to understand and to apply, the IRT was more complex to understand and to implement. Comprehensive assessment of the functioning of each BDI-II item might be helpful in efficient detection of depressive conditions in college students. Copyright © 2018 Elsevier B.V. All rights reserved.
Donati, Maria Anna; Chiesi, Francesca; Izzo, Viola A; Primi, Caterina
As there is a lack of evidence attesting the equivalent item functioning across genders for the most employed instruments used to measure pathological gambling in adolescence, the present study was aimed to test the gender invariance of the Gambling Behavior Scale for Adolescents (GBS-A), a new measurement tool to assess the severity of Gambling Disorder (GD) in adolescents. The equivalence of the items across genders was assessed by analyzing Differential Item Functioning within an Item Response Theory framework. The GBS-A was administered to 1,723 adolescents, and the graded response model was employed. The results attested the measurement equivalence of the GBS-A when administered to male and female adolescent gamblers. Overall, findings provided evidence that the GBS-A is an effective measurement tool of the severity of GD in male and female adolescents and that the scale was unbiased and able to relieve truly gender differences. As such, the GBS-A can be profitably used in educational interventions and clinical treatments with young people.
Mokkink, Lidwine Brigitta; Galindo-Garre, Francisca; Uitdehaag, Bernard Mj
The Multiple Sclerosis Walking Scale-12 (MSWS-12) measures walking ability from the patients' perspective. We examined the quality of the MSWS-12 using an item response theory model, the graded response model (GRM). A total of 625 unique Dutch multiple sclerosis (MS) patients were included. After testing for unidimensionality, monotonicity, and absence of local dependence, a GRM was fit and item characteristics were assessed. Differential item functioning (DIF) for the variables gender, age, duration of MS, type of MS and severity of MS, reliability, total test information, and standard error of the trait level (θ) were investigated. Confirmatory factor analysis showed a unidimensional structure of the 12 items of the scale, explaining 88% of the variance. Item 2 did not fit into the GRM model. Reliability was 0.93. Items 8 and 9 (of the 11 and 12 item version respectively) showed DIF on the variable severity, based on the Expanded Disability Status Scale (EDSS). However, the EDSS is strongly related to the content of both items. Our results confirm the good quality of the MSWS-12. The trait level (θ) scores and item parameters of both the 12- and 11-item versions were highly comparable, although we do not suggest to change the content of the MSWS-12. © The Author(s), 2016.
Hohensinn, Christine; Kubinger, Klaus D.
In aptitude and achievement tests, different response formats are usually used. A fundamental distinction must be made between the class of multiple-choice formats and the constructed response formats. Previous studies have examined the impact of different response formats applying traditional statistical approaches, but these influences can also…
Jordan, Pascal; Shedden-Mora, Meike C; Löwe, Bernd
The Generalized Anxiety Disorder scale (GAD-7) is one of the most frequently used diagnostic self-report scales for screening, diagnosis and severity assessment of anxiety disorder. Its psychometric properties from the view of the Item Response Theory paradigm have rarely been investigated. We aimed to close this gap by analyzing the GAD-7 within a large sample of primary care patients with respect to its psychometric properties and its implications for scoring using Item Response Theory. Robust, nonparametric statistics were used to check unidimensionality of the GAD-7. A graded response model was fitted using a Bayesian approach. The model fit was evaluated using posterior predictive p-values, item information functions were derived and optimal predictions of anxiety were calculated. The sample included N = 3404 primary care patients (60% female; mean age, 52,2; standard deviation 19.2) The analysis indicated no deviations of the GAD-7 scale from unidimensionality and a decent fit of a graded response model. The commonly suggested ultra-brief measure consisting of the first two items, the GAD-2, was supported by item information analysis. The first four items discriminated better than the last three items with respect to latent anxiety. The information provided by the first four items should be weighted more heavily. Moreover, estimates corresponding to low to moderate levels of anxiety show greater variability. The psychometric validity of the GAD-2 was supported by our analysis.
Marcoulides, Katerina M.
This study examined the use of Bayesian analysis methods for the estimation of item parameters in a two-parameter logistic item response theory model. Using simulated data under various design conditions with both informative and non-informative priors, the parameter recovery of Bayesian analysis methods were examined. Overall results showed that…
Matlock Cole, Ki Lynn; Turner, Ronna C.; Gitchel, W. Dent
The generalized partial credit model (GPCM) is often used for polytomous data; however, the nominal response model (NRM) allows for the investigation of how adjacent categories may discriminate differently when items are positively or negatively worded. Ten items from three different self-reported scales were used (anxiety, depression, and…
Zheng, Yinggan; Gierl, Mark J.; Cui, Ying
This study combined the kernel smoothing procedure and a nonparametric differential item functioning statistic--Cochran's Z--to statistically test the difference between the kernel-smoothed item response functions for reference and focal groups. Simulation studies were conducted to investigate the Type I error and power of the proposed…
With the online assessment becoming mainstream and the recording of response times becoming straightforward, the importance of response times as a measure of psychological constructs has been recognized and the literature of modeling times has been growing during the last few decades. Previous studies have tried to formulate models and theories to…
Barry, Adam E; Goodson, Patricia
Report on the development and psychometric testing of a theoretically and evidence-grounded instrument, the Characteristics of Responsible Drinking Survey (CHORDS). Instrument subjected to four phases of pretesting (cognitive validity, cognitive and motivational qualities, pilot test, and item evaluation) and a final posttest implementation. Large public university in Texas. Randomly selected convenience sample (n = 729) of currently enrolled students. This 78-item questionnaire measures individuals' responsible drinking beliefs, motivations, intentions, and behaviors. Cronbach α, split-half reliability, principal components analysis and Spearman ρ were conducted to investigate reliability, stability, and validity. Measures in the CHORDS exhibited high internal consistency reliability and strong correlations of split-half reliability. Factor analyses indicated five distinct scales were present, as proposed in the theoretical model. Subscale composite scores also exhibited a correlation to alcohol consumption behaviors, indicating concurrent validity. The CHORDS represents the first instrument specifically designed to assess responsible drinking beliefs and behaviors. It was found to elicit valid and reliable data among a college student sample. This instrument holds much promise for practitioners who desire to empirically investigate dimensions of responsible drinking.
Saint-Maurice, Pedro F; Welk, Gregory J; Bartee, R Todd; Heelan, Kate
This study tests calibration models to re-scale context-specific physical activity (PA) items to accelerometer-derived PA. A total of 195 4th-12th grades children wore an Actigraph monitor and completed the Physical Activity Questionnaire (PAQ) one week later. The relative time spent in moderate-to-vigorous PA (MVPA % ) obtained from the Actigraph at recess, PE, lunch, after-school, evening and weekend was matched with a respective item score obtained from the PAQ's. Item scores from 145 participants were calibrated against objective MVPA % using multiple linear regression with age, and sex as additional predictors. Predicted minutes of MVPA for school, out-of-school and total week were tested in the remaining sample (n = 50) using equivalence testing. The results showed that PAQ β-weights ranged from 0.06 (lunch) to 4.94 (PE) MVPA % (P PAQ and accelerometer MVPA at school and out-of-school ranged from -15.6 to +3.8 min and the PAQ was within 10-15% of accelerometer measured activity. This study demonstrated that context-specific items can be calibrated to predict minutes of MVPA in groups of youth during in- and out-of-school periods.
This dissertation includes three studies that analyze a new set of assessment tasks developed by the Learning Progressions in Middle School Science (LPS) Project. These assessment tasks were designed to measure science content knowledge on the structure of matter domain and scientific argumentation, while following the goals from the Next Generation Science Standards (NGSS). The three studies focus on the evidence available for the success of this design and its implementation, generally labelled as "validity" evidence. I use explanatory item response models (EIRMs) as the overarching framework to investigate these assessment tasks. These models can be useful when gathering validity evidence for assessments as they can help explain student learning and group differences. In the first study, I explore the dimensionality of the LPS assessment by comparing the fit of unidimensional, between-item multidimensional, and Rasch testlet models to see which is most appropriate for this data. By applying multidimensional item response models, multiple relationships can be investigated, and in turn, allow for a more substantive look into the assessment tasks. The second study focuses on person predictors through latent regression and differential item functioning (DIF) models. Latent regression models show the influence of certain person characteristics on item responses, while DIF models test whether one group is differentially affected by specific assessment items, after conditioning on latent ability. Finally, the last study applies the linear logistic test model (LLTM) to investigate whether item features can help explain differences in item difficulties.
Jian Ming Luo
Full Text Available Macau gambling companies included Corporate Social Responsibility (CSR information in their annual reports and websites as a marketing tool. Responsible Gambling (RG had been a recurring issue in Macau’s chief executive report since 2007 and in many of the major gambling operators’ annual report. The purpose of this study was to develop a measurement scale on CSR activities in Macau. Items on the measurement scale were based on qualitative research with data collected from employees in Macau’s gambling industry and academic literature. First and Second Order confirmatory factor analysis (CFA were used to verify the reliability and validity of the measurement scale. The results of this study were satisfactory and were supported by empirical evidence. This study provided recommendations to gambling stakeholders, including practitioners, government officers, customers and shareholders, and implications to promote CSR practice in Macau gambling industry.
Morren, M.H.; Gelissen, J.P.T.M.; Vermunt, J.K.
This article addresses the following research questions: Do respondents participating in cross-cultural surveys differ in their response style when responding to attitude statements? If so, are characteristics of the response process associated with their ethnicity and generation of immigration? To
Hidalgo, María D; López-Martínez, María D; Gómez-Benito, Juana; Guilera, Georgina
Short scales are typically used in the social, behavioural and health sciences. This is relevant since test length can influence whether items showing DIF are correctly flagged. This paper compares the relative effectiveness of discriminant logistic regression (DLR) and IRTLRDIF for detecting DIF in polytomous short tests. A simulation study was designed. Test length, sample size, DIF amount and item response categories number were manipulated. Type I error and power were evaluated. IRTLRDIF and DLR yielded Type I error rates close to nominal level in no-DIF conditions. Under DIF conditions, Type I error rates were affected by test length DIF amount, degree of test contamination, sample size and number of item response categories. DLR showed a higher Type I error rate than did IRTLRDIF. Power rates were affected by DIF amount and sample size, but not by test length. DLR achieved higher power rates than did IRTLRDIF in very short tests, although the high Type I error rate involved means that this result cannot be taken into account. Test length had an important impact on the Type I error rate. IRTLRDIF and DLR showed a low power rate in short tests and with small sample sizes.
Ye, Zeng Jie; Liang, Mu Zi; Zhang, Hao Wei; Li, Peng Fei; Ouyang, Xue Ren; Yu, Yuan Liang; Liu, Mei Ling; Qiu, Hong Zhong
Classic theory test has been used to develop and validate the 25-item Resilience Scale Specific to Cancer (RS-SC) in Chinese patients with cancer. This study was designed to provide additional information about the discriminative value of the individual items tested with an item response theory analysis. A two-parameter graded response model was performed to examine whether any of the items of the RS-SC exhibited problems with the ordering and steps of thresholds, as well as the ability of items to discriminate patients with different resilience levels using item characteristic curves. A sample of 214 Chinese patients with cancer diagnosis was analyzed. The established three-dimension structure of the RS-SC was confirmed. Several items showed problematic thresholds or discrimination ability and require further revision. Some problematic items should be refined and a short-form of RS-SC maybe feasible in clinical settings in order to reduce burden on patients. However, the generalizability of these findings warrants further investigations.
Dikken, Jeroen; Hoogerduijn, Jita G; Kruitwagen, Cas; Schuurmans, Marieke J
To assess the content validity and psychometric characteristics of the Knowledge about Older Patients Quiz (KOP-Q), which measures nurses' knowledge regarding older hospitalized adults and their certainty regarding this knowledge. Cross-sectional. Content validity: general hospitals. Psychometric characteristics: nursing school and general hospitals in the Netherlands. Content validity: 12 nurse specialists in geriatrics. Psychometric characteristics: 107 first-year and 78 final-year bachelor of nursing students, 148 registered nurses, and 20 nurse specialists in geriatrics. Content validity: The nurse specialists rated each item of the initial KOP-Q (52 items) on relevance. Ratings were used to calculate Item-Content Validity Index and average Scale-Content Validity Index (S-CVI/ave) scores. Items with insufficient content validity were removed. Psychometric characteristics: Ratings of students, nurses, and nurse specialists were used to test for different item functioning (DIF) and unidimensionality before item characteristics (discrimination and difficulty) were examined using Item Response Theory. Finally, norm references were calculated and nomological validity was assessed. Content validity: Forty-three items remained after assessing content validity (S-CVI/ave = 0.90). Psychometric characteristics: Of the 43 items, two demonstrating ceiling effects and 11 distorting ability estimates (DIF) were subsequently excluded. Item characteristics were assessed for the remaining 30 items, all of which demonstrated good discrimination and difficulty parameters. Knowledge was positively correlated with certainty about this knowledge. The final 30-item KOP-Q is a valid, psychometrically sound, comprehensive instrument that can be used to assess the knowledge of nursing students, hospital nurses, and nurse specialists in geriatrics regarding older hospitalized adults. It can identify knowledge and certainty deficits for research purposes or serve as a tool in educational
Khan, Anzalee; Lindenmayer, Jean-Pierre; Opler, Mark; Yavorsky, Christian; Rothman, Brian; Lucic, Luka
Debate persists with regard to how best to categorize the syndromal dimension of negative symptoms in schizophrenia. The aim was to first review published Principle Components Analysis (PCA) of the PANSS, and extract items most frequently included in the negative domain, and secondly, to examine the quality of items using Item Response Theory (IRT) to select items that best represent a measurable dimension (or dimensions) of negative symptoms. First, 22 factor analyses and PCA met were included. Second, using a large dataset (n=7187) of participants in clinical trials with chronic schizophrenia, we extracted items loading on one or more PCA. Third, items not loading with a value of ≥ 0.5, or loading on more than one component with values of ≥ 0.5 were discarded. Fourth, resulting items were included in a non-parametric IRT and retained based on Option Characteristic Curves (OCCs) and Item Characteristic Curves (ICCs). 15 items loaded on a negative domain in at least one study, with Emotional Withdrawal loading on all studies. Non-parametric IRT retained nine items as an Integrated Negative Factor: Emotional Withdrawal, Blunted Affect, Passive/Apathetic Social Withdrawal, Poor Rapport, Lack of Spontaneity/Conversation Flow, Active Social Avoidance, Disturbance of Volition, Stereotyped Thinking and Difficulty in Abstract Thinking. This is the first study to use a psychometric IRT process to arrive at a set of negative symptom items. Future steps will include further examination of these nine items in terms of their stability, sensitivity to change, and correlations with functional and cognitive outcomes. © 2013 Elsevier B.V. All rights reserved.
Cao, Rui; Nosofsky, Robert M; Shiffrin, Richard M
In short-term-memory (STM)-search tasks, observers judge whether a test probe was present in a short list of study items. Here we investigated the long-term learning mechanisms that lead to the highly efficient STM-search performance observed under conditions of consistent-mapping (CM) training, in which targets and foils never switch roles across trials. In item-response learning, subjects learn long-term mappings between individual items and target versus foil responses. In category learning, subjects learn high-level codes corresponding to separate sets of items and learn to attach old versus new responses to these category codes. To distinguish between these 2 forms of learning, we tested subjects in categorized varied mapping (CV) conditions: There were 2 distinct categories of items, but the assignment of categories to target versus foil responses varied across trials. In cases involving arbitrary categories, CV performance closely resembled standard varied-mapping performance without categories and departed dramatically from CM performance, supporting the item-response-learning hypothesis. In cases involving prelearned categories, CV performance resembled CM performance, as long as there was sufficient practice or steps taken to reduce trial-to-trial category-switching costs. This pattern of results supports the category-coding hypothesis for sufficiently well-learned categories. Thus, item-response learning occurs rapidly and is used early in CM training; category learning is much slower but is eventually adopted and is used to increase the efficiency of search beyond that available from item-response learning. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Meade, Adam W.; Craig, S. Bartholomew
When data are collected via anonymous Internet surveys, particularly under conditions of obligatory participation (such as with student samples), data quality can be a concern. However, little guidance exists in the published literature regarding techniques for detecting careless responses. Previously several potential approaches have been…
Maples-Keller, Jessica L; Williamson, Rachel L; Sleep, Chelsea E; Carter, Nathan T; Campbell, W Keith; Miller, Joshua D
Given advantages of freely available and modifiable measures, an increase in the use of measures developed from the International Personality Item Pool (IPIP), including the 300-item representation of the Revised NEO Personality Inventory (NEO PI-R; Costa & McCrae, 1992a ) has occurred. The focus of this study was to use item response theory to develop a 60-item, IPIP-based measure of the Five-Factor Model (FFM) that provides equal representation of the FFM facets and to test the reliability and convergent and criterion validity of this measure compared to the NEO Five Factor Inventory (NEO-FFI). In an undergraduate sample (n = 359), scores from the NEO-FFI and IPIP-NEO-60 demonstrated good reliability and convergent validity with the NEO PI-R and IPIP-NEO-300. Additionally, across criterion variables in the undergraduate sample as well as a community-based sample (n = 757), the NEO-FFI and IPIP-NEO-60 demonstrated similar nomological networks across a wide range of external variables (r ICC = .96). Finally, as expected, in an MTurk sample the IPIP-NEO-60 demonstrated advantages over the Big Five Inventory-2 (Soto & John, 2017 ; n = 342) with regard to the Agreeableness domain content. The results suggest strong reliability and validity of the IPIP-NEO-60 scores.
Balsis, Steve; Choudhury, Tabina K; Geraci, Lisa; Benge, Jared F; Patrick, Christopher J
Alzheimer's disease (AD) affects neurological, cognitive, and behavioral processes. Thus, to accurately assess this disease, researchers and clinicians need to combine and incorporate data across these domains. This presents not only distinct methodological and statistical challenges but also unique opportunities for the development and advancement of psychometric techniques. In this article, we describe relatively recent research using item response theory (IRT) that has been used to make progress in assessing the disease across its various symptomatic and pathological manifestations. We focus on applications of IRT to improve scoring, test development (including cross-validation and adaptation), and linking and calibration. We conclude by describing potential future multidimensional applications of IRT techniques that may improve the precision with which AD is measured.
Ferreira, Aristides I; Almeida, Leandro S; Prieto, Gerardo
In accordance with Item Response Theory, a computer memory battery with six tests was constructed for use in the Portuguese adult population. A factor analysis was conducted to assess the internal structure of the tests (N = 547 undergraduate students). According to the literature, several confirmatory factor models were evaluated. Results showed better fit of a model with two independent latent variables corresponding to verbal and non-verbal factors, reproducing the initial battery organization. Internal consistency reliability for the six tests were alpha = .72 to .89. IRT analyses (Rasch and partial credit models) yielded good Infit and Outfit measures and high precision for parameter estimation. The potential utility of these memory tasks for psychological research and practice willbe discussed.
Khorramdel, Lale; Kubinger, Klaus D; Uitz, Alexander
An experiment was conducted to investigate the effects of item order and questionnaire content on faking good or intentional response distortion. It was hypothesized that intentional response distortion would either increase towards the end of a long questionnaire, as learning effects might make it easier to adjust responses to a faking good schema, or decrease because applicants' will to distort responses is reduced if the questionnaire lasts long enough. Furthermore, it was hypothesized that certain types of questionnaire content are especially vulnerable to response distortion. Eighty-four pre-selected pilot applicants filled out a questionnaire consisting of 516 items including items from the NEO five factor inventory (NEO FFI), NEO personality inventory revised (NEO PI-R) and business-focused inventory of personality (BIP). The positions of the items were varied within the applicant sample to test if responses are affected by item order, and applicants' response behaviour was additionally compared to that of volunteers. Applicants reported significantly higher mean scores than volunteers, and results provide some evidence of decreased faking tendencies towards the end of the questionnaire. Furthermore, it could be demonstrated that lower variances or standard deviations in combination with appropriate (often higher) mean scores can serve as an indicator for faking tendencies in group comparisons, even if effects are not significant. © 2013 International Union of Psychological Science.
Azevedo, Caio L. N.; Migon, Helio S.
In this paper we introduce a new item response theory (IRT) model with a generalized Student t-link function with unknown degrees of freedom (df), named generalized t-link (GtL) IRT model. In this model we consider only the difficulty parameter in the item response function. GtL is an alternative to the two parameter logit and probit models, since the degrees of freedom (df) play a similar role to the discrimination parameter. However, the behavior of the curves of the GtL is different from those of the two parameter models and the usual Student t link, since in GtL the curve obtained from different df's can cross the probit curves in more than one latent trait level. The GtL model has similar proprieties to the generalized linear mixed models, such as the existence of sufficient statistics and easy parameter interpretation. Also, many techniques of parameter estimation, model fit assessment and residual analysis developed for that models can be used for the GtL model. We develop fully Bayesian estimation and model fit assessment tools through a Metropolis-Hastings step within Gibbs sampling algorithm. We consider a prior sensitivity choice concerning the degrees of freedom. The simulation study indicates that the algorithm recovers all parameters properly. In addition, some Bayesian model fit assessment tools are considered. Finally, a real data set is analyzed using our approach and other usual models. The results indicate that our model fits the data better than the two parameter models.
Tsang, Siny; Schmidt, Karen M.; Vincent, Gina M.; Salekin, Randall T.; Moretti, Marlene M.; Odgers, Candice L.
This study used an item response theory (IRT) model and a large adolescent sample of justice involved youth (N = 1,007, 38% female) to examine the item functioning of the Psychopathy Checklist – Youth Version (PCL: YV). Items that were most discriminating (or most sensitive to changes) of the latent trait (thought to be psychopathy) among adolescents included “Glibness/superficial charm”, “Lack of remorse”, and “Need for stimulation”, whereas items that were least discriminating included “Pathological lying”, “Failure to accept responsibility”, and “Lacks goals.” The items “Impulsivity” and “Irresponsibility” were the most likely to be rated high among adolescents, whereas “Parasitic lifestyle”, and “Glibness/superficial charm” were the most likely to be rated low. Evidence of differential item functioning (DIF) on four of the 13 items was found between boys and girls. “Failure to accept responsibility” and “Impulsivity” were endorsed more frequently to describe adolescent girls than boys at similar levels of the latent trait, and vice versa for “Grandiose sense of self-worth” and “Lacks goals.” The DIF findings suggest that four PCL: YV items function differently between boys and girls. PMID:25580672
Hong, Quan Nha; Coutu, Marie-France; Berbiche, Djamal
The Work Role Functioning Questionnaire (WRFQ) was developed to assess workers' perceived ability to perform job demands and is used to monitor presenteeism. Still few studies on its validity can be found in the literature. The purpose of this study was to assess the items and factorial composition of the Canadian French version of the WRFQ (WRFQ-CF). Two measurement approaches were used to test the WRFQ-CF: Classical Test Theory (CTT) and non-parametric Item Response Theory (IRT). A total of 352 completed questionnaires were analyzed. A four-factor and three-factor model models were tested and shown respectively good fit with 14 items (Root Mean Square Error of Approximation (RMSEA) = 0.06, Standardized Root Mean Square Residual (SRMR) = 0.04, Bentler Comparative Fit Index (CFI) = 0.98) and with 17 items (RMSEA = 0.059, SRMR = 0.048, CFI = 0.98). Using IRT, 13 problematic items were identified, of which 9 were common with CTT. This study tested different models with fewer problematic items found in a three-factor model. Using a non-parametric IRT and CTT for item purification gave complementary results. IRT is still scarcely used and can be an interesting alternative method to enhance the quality of a measurement instrument. More studies are needed on the WRFQ-CF to refine its items and factorial composition.
Titman, Andrew C; Lancaster, Gillian A; Colver, Allan F
Both item response theory and structural equation models are useful in the analysis of ordered categorical responses from health assessment questionnaires. We highlight the advantages and disadvantages of the item response theory and structural equation modelling approaches to modelling ordinal data, from within a community health setting. Using data from the SPARCLE project focussing on children with cerebral palsy, this paper investigates the relationship between two ordinal rating scales, the KIDSCREEN, which measures quality-of-life, and Life-H, which measures participation. Practical issues relating to fitting models, such as non-positive definite observed or fitted correlation matrices, and approaches to assessing model fit are discussed. item response theory models allow properties such as the conditional independence of particular domains of a measurement instrument to be assessed. When, as with the SPARCLE data, the latent traits are multidimensional, structural equation models generally provide a much more convenient modelling framework. © The Author(s) 2013.
Dowling, N Maritza; Bolt, Daniel M; Deng, Sien; Li, Chenxi
Patient-reported outcome (PRO) measures play a key role in the advancement of patient-centered care research. The accuracy of inferences, relevance of predictions, and the true nature of the associations made with PRO data depend on the validity of these measures. Errors inherent to self-report measures can seriously bias the estimation of constructs assessed by the scale. A well-documented disadvantage of self-report measures is their sensitivity to response style (RS) effects such as the respondent's tendency to select the extremes of a rating scale. Although the biasing effect of extreme responding on constructs measured by self-reported tools has been widely acknowledged and studied across disciplines, little attention has been given to the development and systematic application of methodologies to assess and control for this effect in PRO measures. We review the methodological approaches that have been proposed to study extreme RS effects (ERS). We applied a multidimensional item response theory model to simultaneously estimate and correct for the impact of ERS on trait estimation in a PRO instrument. Model estimates were used to study the biasing effects of ERS on sum scores for individuals with the same amount of the targeted trait but different levels of ERS. We evaluated the effect of joint estimation of multiple scales and ERS on trait estimates and demonstrated the biasing effects of ERS on these trait estimates when used as explanatory variables. A four-dimensional model accounting for ERS bias provided a better fit to the response data. Increasing levels of ERS showed bias in total scores as a function of trait estimates. The effect of ERS was greater when the pattern of extreme responding was the same across multiple scales modeled jointly. The estimated item category intercepts provided evidence of content independent category selection. Uncorrected trait estimates used as explanatory variables in prediction models showed downward bias. A
The graded response model (GRM), which is based on item response theory (IRT), was used to evaluate the psychometric properties of the inattention and hyperactivity/impulsivity symptoms in an ADHD rating scale. To accomplish this, parents and teachers completed the DSM-IV ADHD Rating Scale (DARS; Gomez et al., "Journal of Child Psychology and…
Stuart, Heather; Patten, Scott B; Koller, Michelle; Modgill, Geeta; Liinamaa, Tiina
Objective: Our paper presents findings from the first population survey of stigma in Canada using a new measure of stigma. Empirical objectives are to provide a descriptive profile of Canadian’s expectations that people will devalue and discriminate against someone with depression, and to explore the relation between experiences of being stigmatized in the year prior to the survey among people having been treated for a mental illness with a selected number of sociodemographic and mental health–related variables. Method: Data were collected by Statistics Canada using a rapid response format on a representative sample of Canadians (n = 10 389) during May and June of 2010. Public expectations of stigma and personal experiences of stigma in the subgroup receiving treatment for a mental illness were measured. Results: Over one-half of the sample endorsed 1 or more of the devaluation discrimination items, indicating that they believed Canadians would stigmatize someone with depression. The item most frequently endorsed concerned employers not considering an application from someone who has had depression. Over one-third of people who had received treatment in the year prior to the survey reported discrimination in 1 or more life domains. Experiences of discrimination were strongly associated with perceptions that Canadians would devalue someone with depression, younger age (12 to 15 years), and self-reported poor general mental health. Conclusions: The Mental Health Experiences Module reflects an important partnership between 2 national organizations that will help Canada fulfill its monitoring obligations under the United Nations Convention on the Rights of Persons with Disabilities and provide a legacy to researchers and policy-makers who are interested in monitoring changes in stigma over time. PMID:25565699
Milton, Alyssa C; Ellis, Louise A; Davenport, Tracey A; Burns, Jane M; Hickie, Ian B
Web-based self-report surveying has increased in popularity, as it can rapidly yield large samples at a low cost. Despite this increase in popularity, in the area of youth mental health, there is a distinct lack of research comparing the results of Web-based self-report surveys with the more traditional and widely accepted computer-assisted telephone interviewing (CATI). The Second Australian Young and Well National Survey 2014 sought to compare differences in respondent response patterns using matched items on CATI versus a Web-based self-report survey. The aim of this study was to examine whether responses varied as a result of item sensitivity, that is, the item's susceptibility to exaggeration on underreporting and to assess whether certain subgroups demonstrated this effect to a greater extent. A subsample of young people aged 16 to 25 years (N=101), recruited through the Second Australian Young and Well National Survey 2014, completed the identical items on two occasions: via CATI and via Web-based self-report survey. Respondents also rated perceived item sensitivity. When comparing CATI with the Web-based self-report survey, a Wilcoxon signed-rank analysis showed that respondents answered 14 of the 42 matched items in a significantly different way. Significant variation in responses (CATI vs Web-based) was more frequent if the item was also rated by the respondents as highly sensitive in nature. Specifically, 63% (5/8) of the high sensitivity items, 43% (3/7) of the neutral sensitivity items, and 0% (0/4) of the low sensitivity items were answered in a significantly different manner by respondents when comparing their matched CATI and Web-based question responses. The items that were perceived as highly sensitive by respondents and demonstrated response variability included the following: sexting activities, body image concerns, experience of diagnosis, and suicidal ideation. For high sensitivity items, a regression analysis showed respondents who were male
Janulis, Patrick; Newcomb, Michael E; Sullivan, Patrick; Mustanski, Brian
Knowledge about the transmission, prevention, and treatment of HIV remains a critical element in psychosocial models of HIV risk behavior and is commonly used as an outcome in HIV prevention interventions. However, most HIV knowledge questions have not undergone rigorous psychometric testing such as using item response theory. The current study used data from six studies of men who have sex with men (MSM; n = 3565) to (1) examine the item properties of HIV knowledge questions, (2) test for differential item functioning on commonly studied characteristics (i.e., age, race/ethnicity, and HIV risk behavior), (3) select items with the optimal item characteristics, and (4) leverage this combined dataset to examine the potential moderating effect of age on the relationship between condomless anal sex (CAS) and HIV knowledge. Findings indicated that existing questions tend to poorly differentiate those with higher levels of HIV knowledge, but items were relatively robust across diverse individuals. Furthermore, age moderated the relationship between CAS and HIV knowledge with older MSM having the strongest association. These findings suggest that additional items are required in order to capture a more nuanced understanding of HIV knowledge and that the association between CAS and HIV knowledge may vary by age.
Della Apriyani Kusuma Putri
Full Text Available Kemampuan literasi sains adalah suatu kemampuan yang memungkinkan seseorang untuk membuat suatu keputusan dengan pengetahuan konsep dan proses sains yang dimilikinya. Berbagai macam permasalahan yang terjadi di era globalisasi ini menuntut siswa untuk tidak hanya cakap dalam aspek kognitif tapi juga mampu memberi keputusan untuk memecahkan permasalahan, sehingga dapat dikatakan bahwa kemampuan literasi sains adalah kemampuan yang penting dan harus dimiliki siswa. Oleh karena itu, dibutuhkan instrumen untuk mengukur kemampuan literasi sains. hal inilah yang mendasari peneliti mengembangkan instrumen kemampuan literasi sains. Tujuan penelitian ini adalah untuk mengembangkan dan mengetahui karakteristik tes kemampuan literasi sains fisika siswa SMA pada materi momentum dan impuls berdasarkan aspek literasi sains yang dikemukakan oleh Gormally. Metode penelitian yang diterapkan adalah penelitian dan pengembangan (Research and Development yaitu metode penelitian yang digunakan untuk menghasilkan produk tertentu, dan menguji keefektifan produk tersebut. Sebelum diuji coba tes telah divalidasi oleh tiga orang validator dan menghasilkan kesimpulan bahwa tes cukup baik dan dapat diuji coba. Hasil analisis menggunakan Item Response Theory menunjukkan bahwa model 3PL adalah model yang sesuai dengan karakteristik tes. Sedangkan karakteristik tes yang meliputi daya pembeda, tingkat kesukaran, dan faktor tebakan termasuk dalam kategori baik. Science literacy skills is an ability that allows one to make a decision with the knowledge of the concepts and processes of science has. A wide variety of problems that occur in a globalized world requires students to not only proficient in cognitive but also able to make a decision to solve the problem, so it can be said that the ability of science literacy is an important capability and must be owned by the students. Therefore, the instrument is required to measure the ability of science literacy. This problem is
Jahn, Danielle R; Dressel, Jeffrey A; Gavett, Brandon E; O'Bryant, Sid E
The Executive Interview (EXIT25) is an effective measure of executive dysfunction, but may be inefficient due to the time it takes to complete 25 interview-based items. The current study aimed to examine psychometric properties of the EXIT25, with a specific focus on determining whether a briefer version of the measure could comprehensively assess executive dysfunction. The current study applied a graded response model (a type of item response theory model for polytomous categorical data) to identify items that were most closely related to the underlying construct of executive functioning and best discriminated between varying levels of executive functioning. Participants were 660 adults ages 40 to 96 years living in West Texas, who were recruited through an ongoing epidemiological study of rural health and aging, called Project FRONTIER. The EXIT25 was the primary measure examined. Participants also completed the Trail Making Test and Controlled Oral Word Association Test, among other measures, to examine the convergent validity of a brief form of the EXIT25. Eight items were identified that provided the majority of the information about the underlying construct of executive functioning; total scores on these items were associated with total scores on other measures of executive functioning and were able to differentiate between cognitively healthy, mildly cognitively impaired, and demented participants. In addition, cutoff scores were recommended based on sensitivity and specificity of scores. A brief, eight-item version of the EXIT25 may be an effective and efficient screening for executive dysfunction among older adults.
Oude Voshaar, Martijn A H; Schenk, Olga; Ten Klooster, Peter M; Vonkeman, Harald E; Bernelot Moens, Hein J; Boers, Maarten; van de Laar, Mart A F J
To further simplify the simple erosion narrowing score (SENS) by removing scored areas that contribute the least to its measurement precision according to analysis based on item response theory (IRT) and to compare the measurement performance of the simplified version to the original. Baseline and 18-month data of the Combinatietherapie Bij Reumatoide Artritis (COBRA) trial were modeled using longitudinal IRT methodology. Measurement precision was evaluated across different levels of structural damage. SENS was further simplified by omitting the least reliably scored areas. Discriminant validity of SENS and its simplification were studied by comparing their ability to differentiate between the COBRA and sulfasalazine arms. Responsiveness was studied by comparing standardized change scores between versions. SENS data showed good fit to the IRT model. Carpal and feet joints contributed the least statistical information to both erosion and joint space narrowing scores. Omitting the joints of the foot reduced measurement precision for the erosion score in cases with below-average levels of structural damage (relative efficiency compared with the original version ranged 35-59%). Omitting the carpal joints had minimal effect on precision (relative efficiency range 77-88%). Responsiveness of a simplified SENS without carpal joints closely approximated the original version (i.e., all Δ standardized change scores were ≤0.06). Discriminant validity was also similar between versions for both the erosion score (relative efficiency = 97%) and the SENS total score (relative efficiency = 84%). Our results show that the carpal joints may be omitted from the SENS without notable repercussion for its measurement performance. © 2016, American College of Rheumatology.
Meijer, Rob R; Egberink, Iris J L; Emons, Wilco H M; Sijtsma, Klaas
We illustrate the usefulness of person-fit methodology for personality assessment. For this purpose, we use person-fit methods from item response theory. First, we give a nontechnical introduction to existing person-fit statistics. Second, we analyze data from Harter's (1985) Self-Perception Profile for Children (Harter, 1985) in a sample of children ranging from 8 to 12 years of age (N = 611) and argue that for some children, the scale scores should be interpreted with care and caution. Combined information from person-fit indexes and from observation, interviews, and self-concept theory showed that similar score profiles may have a different interpretation. For some children in the sample, item scores did not adequately reflect their trait level. Based on teacher interviews, this was found to be due most likely to a less developed self-concept and/or problems understanding the meaning of the questions. We recommend investigating the scalability of score patterns when using self-report inventories to help the researcher interpret respondents' behavior correctly.
Holden, Libby; Lee, Christina; Hockey, Richard; Ware, Robert S; Dobson, Annette J
This study aimed to validate a 6-item 1-factor global measure of social support developed from the Medical Outcomes Study Social Support Survey (MOS-SSS) for use in large epidemiological studies. Data were obtained from two large population-based samples of participants in the Australian Longitudinal Study on Women's Health. The two cohorts were aged 53-58 and 28-33 years at data collection (N = 10,616 and 8,977, respectively). Items selected for the 6-item 1-factor measure were derived from the factor structure obtained from unpublished work using an earlier wave of data from one of these cohorts. Descriptive statistics, including polychoric correlations, were used to describe the abbreviated scale. Cronbach's alpha was used to assess internal consistency and confirmatory factor analysis to assess scale validity. Concurrent validity was assessed using correlations between the new 6-item version and established 19-item version, and other concurrent variables. In both cohorts, the new 6-item 1-factor measure showed strong internal consistency and scale reliability. It had excellent goodness-of-fit indices, similar to those of the established 19-item measure. Both versions correlated similarly with concurrent measures. The 6-item 1-factor MOS-SSS measures global functional social support with fewer items than the established 19-item measure.
Ayis, Salma A; Ayerbe, Luis; Ashworth, Mark; DA Wolfe, Charles
Variations have been reported in the number of underlying constructs and choice of thresholds that determine caseness of anxiety and /or depression using the Hospital Anxiety and Depression scale (HADS). This study examined the properties of each item of HADS as perceived by stroke patients, and assessed the information these items convey about anxiety and depression between 3 months to 5 years after stroke. The study included 1443 stroke patients from the South London Stroke Register (SLSR). The dimensionality of HADS was examined using factor analysis methods, and items' properties up to 5 years after stroke were tested using Item Response Theory (IRT) methods, including graded response models (GRMs). The presence of two dimensions of HADS (anxiety and depression) for stroke patients was confirmed. Items that accurately inferred about the severity of anxiety and depression, and offered good discrimination of caseness were identified as "I can laugh and see the funny side of things" (Q4) and "I get sudden feelings of panic" (Q13), discrimination 2.44 (se = 0.26), and 3.34 (se = 0.35), respectively. Items that shared properties, hence replicate inference were: "I get a sort of frightened feeling as if something awful is about to happen" (Q3), "I get a sort of frightened feeling like butterflies in my stomach" (Q6), and "Worrying thoughts go through my mind" (Q9). Item properties were maintained over time. Approximately 20% of patients were lost to follow up. A more concise selection of items based on their properties, would provide a precise approach for screening patients and for an optimal allocation of patients into clinical trials. Copyright © 2017 Elsevier B.V. All rights reserved.
Choe, Edison M.; Kern, Justin L.; Chang, Hua-Hua
Despite common operationalization, measurement efficiency of computerized adaptive testing should not only be assessed in terms of the number of items administered but also the time it takes to complete the test. To this end, a recent study introduced a novel item selection criterion that maximizes Fisher information per unit of expected response…
Reise, Steven P.; Ventura, Joseph; Keefe, Richard S. E.; Baade, Lyle E.; Gold, James M.; Green, Michael F.; Kern, Robert S.; Mesholam-Gately, Raquelle; Nuechterlein, Keith H.; Seidman, Larry J.; Bilder, Robert
A psychometric analysis of 2 interview-based measures of cognitive deficits was conducted: the 21-item Clinical Global Impression of Cognition in Schizophrenia (CGI-CogS; Ventura et al., 2008), and the 20-item Schizophrenia Cognition Rating Scale (SCoRS; Keefe et al., 2006), which were administered on 2 occasions to a sample of people with…
Mahoney, Alison E J; Hobbs, Megan J; Newby, Jill M; Williams, Alishia D; Andrews, Gavin
Cognitive models of generalized anxiety disorder (GAD) suggest that maladaptive behaviours may contribute to the maintenance of the disorder; however, little research has concentrated on identifying and measuring these behaviours. To address this gap, the Worry Behaviors Inventory (WBI) was developed and has been evaluated within a classical test theory (CTT) approach. As CTT is limited in several important respects, this study examined the psychometric properties of the WBI using an Item Response Theory approach. A large sample of adults commencing treatment for their symptoms of GAD (n = 537) completed the WBI in addition to measures of GAD and depression symptom severity. Patients with a probable diagnosis of GAD typically engaged in four or five maladaptive behaviours most or all of the time in an attempt to prevent, control or avoid worrying about everyday concerns. The two-factor structure of the WBI was confirmed, and the WBI scales demonstrated good reliability across a broad range of the respective scales. Together with previous findings, our results suggested that hypervigilance and checking behaviours, as well as avoidance of saying or doing things that are worrisome, were the most relevant maladaptive behaviours associated with GAD, and discriminated well between adults with low, moderate and high degrees of the respective WBI scales. Our results support the importance of maladaptive behaviours to GAD and the utility of the WBI to index these behaviours. Ramifications for the classification, theoretical conceptualization and treatment of GAD are discussed.
Nguyen, Tam H.; Han, Hae-Ra; Kim, Miyong T.
The growing emphasis on patient-centered care has accelerated the demand for high-quality data from patient-reported outcome (PRO) measures. Traditionally, the development and validation of these measures has been guided by classical test theory. However, item response theory (IRT), an alternate measurement framework, offers promise for addressing practical measurement problems found in health-related research that have been difficult to solve through classical methods. This paper introduces foundational concepts in IRT, as well as commonly used models and their assumptions. Existing data on a combined sample (n = 636) of Korean American and Vietnamese American adults who responded to the High Blood Pressure Health Literacy Scale and the Patient Health Questionnaire-9 are used to exemplify typical applications of IRT. These examples illustrate how IRT can be used to improve the development, refinement, and evaluation of PRO measures. Greater use of methods based on this framework can increase the accuracy and efficiency with which PROs are measured. PMID:24403095
Backman, Annica; Sjögren, Karin; Lindkvist, Marie; Lövheim, Hugo; Edvardsson, David
To identify characteristics of highly rated leadership in nursing homes. An ageing population entails fundamental social, economic and organizational challenges for future aged care. Knowledge is limited of both specific leadership behaviours and organizational and managerial characteristics which have an impact on the leadership of contemporary nursing home care. Cross-sectional. From 290 municipalities, 60 were randomly selected and 35 agreed to participate, providing a sample of 3605 direct-care staff employed in 169 Swedish nursing homes. The staff assessed their managers' (n = 191) leadership behaviours using the Leadership Behaviour Questionnaire. Data were collected from November 2013 - September 2014, and the study was completed in November 2016. A two-parameter item response theory approach and regression analyses were used to identify specific characteristics of highly rated leadership. Five specific behaviours of highly rated nursing home leadership were identified; that the manager: experiments with new ideas; controls work closely; relies on subordinates; coaches and gives direct feedback; and handles conflicts constructively. The regression analyses revealed that managers with social work backgrounds and privately run homes were significantly associated with higher leadership ratings. This study highlights the five most important leadership behaviours that characterize those nursing home managers rated highest in terms of leadership. Managers in privately run nursing homes and managers with social work backgrounds were associated with higher leadership ratings. Further work is needed to explore these behaviours and factors predictive of higher leadership ratings. © 2017 John Wiley & Sons Ltd.
Davenport, Tracey A; Burns, Jane M; Hickie, Ian B
Background Web-based self-report surveying has increased in popularity, as it can rapidly yield large samples at a low cost. Despite this increase in popularity, in the area of youth mental health, there is a distinct lack of research comparing the results of Web-based self-report surveys with the more traditional and widely accepted computer-assisted telephone interviewing (CATI). Objective The Second Australian Young and Well National Survey 2014 sought to compare differences in respondent response patterns using matched items on CATI versus a Web-based self-report survey. The aim of this study was to examine whether responses varied as a result of item sensitivity, that is, the item’s susceptibility to exaggeration on underreporting and to assess whether certain subgroups demonstrated this effect to a greater extent. Methods A subsample of young people aged 16 to 25 years (N=101), recruited through the Second Australian Young and Well National Survey 2014, completed the identical items on two occasions: via CATI and via Web-based self-report survey. Respondents also rated perceived item sensitivity. Results When comparing CATI with the Web-based self-report survey, a Wilcoxon signed-rank analysis showed that respondents answered 14 of the 42 matched items in a significantly different way. Significant variation in responses (CATI vs Web-based) was more frequent if the item was also rated by the respondents as highly sensitive in nature. Specifically, 63% (5/8) of the high sensitivity items, 43% (3/7) of the neutral sensitivity items, and 0% (0/4) of the low sensitivity items were answered in a significantly different manner by respondents when comparing their matched CATI and Web-based question responses. The items that were perceived as highly sensitive by respondents and demonstrated response variability included the following: sexting activities, body image concerns, experience of diagnosis, and suicidal ideation. For high sensitivity items, a regression
Cristea, Ioana Alina; Ioannidis, John P A
P values represent a widely used, but pervasively misunderstood and fiercely contested method of scientific inference. Display items, such as figures and tables, often containing the main results, are an important source of P values. We conducted a survey comparing the overall use of P values and the occurrence of significant P values in display items of a sample of articles in the three top multidisciplinary journals (Nature, Science, PNAS) in 2017 and, respectively, in 1997. We also examined the reporting of multiplicity corrections and its potential influence on the proportion of statistically significant P values. Our findings demonstrated substantial and growing reliance on P values in display items, with increases of 2.5 to 14.5 times in 2017 compared to 1997. The overwhelming majority of P values (94%, 95% confidence interval [CI] 92% to 96%) were statistically significant. Methods to adjust for multiplicity were almost non-existent in 1997, but reported in many articles relying on P values in 2017 (Nature 68%, Science 48%, PNAS 38%). In their absence, almost all reported P values were statistically significant (98%, 95% CI 96% to 99%). Conversely, when any multiplicity corrections were described, 88% (95% CI 82% to 93%) of reported P values were statistically significant. Use of Bayesian methods was scant (2.5%) and rarely (0.7%) articles relied exclusively on Bayesian statistics. Overall, wider appreciation of the need for multiplicity corrections is a welcome evolution, but the rapid growth of reliance on P values and implausibly high rates of reported statistical significance are worrisome.
Sara Mortaz Hejri
Full Text Available Purpose In a sequential objective structured clinical examination (OSCE, all students initially take a short screening OSCE. Examinees who pass are excused from further testing, but an additional OSCE is administered to the remaining examinees. Previous investigations of sequential OSCE were based on classical test theory. We aimed to design and evaluate screening OSCEs based on item response theory (IRT. Methods We carried out a retrospective observational study. At each station of a 10-station OSCE, the students’ performance was graded on a Likert-type scale. Since the data were polytomous, the difficulty parameters, discrimination parameters, and students’ ability were calculated using a graded response model. To design several screening OSCEs, we identified the 5 most difficult stations and the 5 most discriminative ones. For each test, 5, 4, or 3 stations were selected. Normal and stringent cut-scores were defined for each test. We compared the results of each of the 12 screening OSCEs to the main OSCE and calculated the positive and negative predictive values (PPV and NPV, as well as the exam cost. Results A total of 253 students (95.1% passed the main OSCE, while 72.6% to 94.4% of examinees passed the screening tests. The PPV values ranged from 0.98 to 1.00, and the NPV values ranged from 0.18 to 0.59. Two tests effectively predicted the results of the main exam, resulting in financial savings of 34% to 40%. Conclusion If stations with the highest IRT-based discrimination values and stringent cut-scores are utilized in the screening test, sequential OSCE can be an efficient and convenient way to conduct an OSCE.
Hejri, Sara Mortaz; Jalili, Mohammad
In a sequential objective structured clinical examination (OSCE), all students initially take a short screening OSCE. Examinees who pass are excused from further testing, but an additional OSCE is administered to the remaining examinees. Previous investigations of sequential OSCE were based on classical test theory. We aimed to design and evaluate screening OSCEs based on item response theory (IRT). We carried out a retrospective observational study. At each station of a 10-station OSCE, the students' performance was graded on a Likert-type scale. Since the data were polytomous, the difficulty parameters, discrimination parameters, and students' ability were calculated using a graded response model. To design several screening OSCEs, we identified the 5 most difficult stations and the 5 most discriminative ones. For each test, 5, 4, or 3 stations were selected. Normal and stringent cut-scores were defined for each test. We compared the results of each of the 12 screening OSCEs to the main OSCE and calculated the positive and negative predictive values (PPV and NPV), as well as the exam cost. A total of 253 students (95.1%) passed the main OSCE, while 72.6% to 94.4% of examinees passed the screening tests. The PPV values ranged from 0.98 to 1.00, and the NPV values ranged from 0.18 to 0.59. Two tests effectively predicted the results of the main exam, resulting in financial savings of 34% to 40%. If stations with the highest IRT-based discrimination values and stringent cut-scores are utilized in the screening test, sequential OSCE can be an efficient and convenient way to conduct an OSCE.
Ágoston, Csilla; Urbán, Róbert; Richman, Mara J; Demetrovics, Zsolt
Caffeine is a common psychoactive substance with a documented addictive potential. Caffeine withdrawal has been included in the Diagnostic and Statistical Manual of Mental Disorders (DSM-5), but caffeine use disorder (CUD) is considered to be a condition for further study. The aim of the current study is (1) to test the psychometric properties of the Caffeine Use Disorder Questionnaire (CUDQ) by using a confirmatory factor analysis and an item response theory (IRT) approach, (2) to compare IRT models with varying numbers of parameters and models with or without caffeine consumption criteria, and (3) to examine if the total daily caffeine consumption and the use of different caffeinated products can predict the magnitude of CUD symptomatology. A cross-sectional study was conducted on an adult sample (N = 2259). Participants answered several questions regarding their caffeine consumption habits and completed the CUDQ, which incorporates the nine proposed criteria of the DSM-5 as well as one additional item regarding the suffering caused by the symptoms. Factor analyses demonstrated the unidimensionality of the CUDQ. The suffering criterion had the highest discriminative value at a higher degree of latent trait. The criterion of failure to fulfill obligations and social/interpersonal problems discriminate only at the higher value of CUD latent factor, while endorsement the consumption of more caffeine or longer than intended and craving criteria were discriminative at a lower level of CUD. Total daily caffeine intake was related to a higher level of CUD. Daily coffee, energy drink, and cola intake as dummy variables were associated with the presence of more CUD symptoms, while daily tea consumption as a dummy variable was related to less CUD symptoms. Regular smoking was associated with more CUD symptoms, which was explained by a larger caffeine consumption. The IRT approach helped to determine which CUD symptoms indicate more severity and have a greater
Full Text Available Abstract Background The 12-item Short Form Health Survey (SF-12 as a shorter alternative of the SF-36 is largely used in health outcomes surveys. The aim of this study was to validate the SF-12 in Iran. Methods A random sample of the general population aged 15 years and over living in Tehran, Iran completed the SF-12. Reliability was estimated using internal consistency and validity was assessed using known groups comparison and convergent validity. In addition, the factor structure of the questionnaire was extracted by performing both exploratory factor analysis (EFA and confirmatory factor analysis (CFA. Results: In all, 5587 individuals were studied (2721 male and 2866 female. The mean age and formal education of the respondents were 35.1 (SD = 15.4 and 10.2 (SD = 4.4 years respectively. The results showed satisfactory internal consistency for both summary measures, that are the Physical Component Summary (PCS and the Mental Component Summary (MCS; Cronbach's α for PCS-12 and MCS-12 was 0.73 and 0.72, respectively. Known-groups comparison showed that the SF-12 discriminated well between men and women and those who differed in age and educational status (P Conclusion In general the findings suggest that the SF-12 is a reliable and valid measure of health related quality of life among Iranian population. However, further studies are needed to establish stronger psychometric properties for this alternative form of the SF-36 Health Survey in Iran.
Full Text Available The Multiple Sclerosis Quality of Life-54 (MSQOL-54, 52 items grouped in 12 subscales plus two single items is the most used MS specific health related quality of life inventory.To develop a shortened version of the MSQOL-54.MSQOL-54 dimensionality and metric properties were investigated by confirmatory factor analysis (CFA and Rasch modelling (Partial Credit Model, PCM on MSQOL-54s completed by 473 MS patients. Their mean age was 41 years, 65% were women, and median Expanded Disability Status Scale (EDSS score was 2.0 (range 0-9.5. Differential item functioning (DIF was evaluated for gender, age and EDSS. Dimensionality of the resulting short version was assessed by exploratory factor analysis (EFA and CFA. Cognitive debriefing of the short instrument (vs. the original was then performed on 12 MS patients.CFA of MSQOL-54 subscales showed that the data fitted the overall model well. Two subscales (Role Limitations--Physical, Role Limitations--Emotional did not fit the PCM, and were removed; two other subscales (Health Perceptions, Social Function did not fit the model, but were retained as single items. Sexual Satisfaction (single-item subscale was also removed. The resulting MSQOL-29 consisted of 25 items grouped in 7 subscales, plus 4 single items. PCM fit statistics were within the acceptability range for all MSQOL-29 items except one which had significant DIF by age. EFA and CFA indicated adequate fit to the original two-factor (Physical and Mental Health Composites hypothesis. Cognitive debriefing confirmed that MSQOL-29 was acceptable and had lost no key items.The proposed MSQOL-29 is 50% shorter than MSQOL-54, yet preserves key quality of life dimensions. Prospective validation on a large, independent MS patient sample is ongoing.
Peterson, Alexander C; Sutherland, Jason M; Liu, Guiping; Crump, R Trafford; Karimuddin, Ahmer A
The Fecal Incontinence Quality of Life Scale (FIQL) is a commonly used patient-reported outcome measure for fecal incontinence, often used in clinical trials, yet has not been validated in English since its initial development. This study uses modern methods to thoroughly evaluate the psychometric characteristics of the FIQL and its potential for differential functioning by gender. This study analyzed prospectively collected patient-reported outcome data from a sample of patients prior to colorectal surgery. Patients were recruited from 14 general and colorectal surgeons in Vancouver Coastal Health hospitals in Vancouver, Canada. Confirmatory factor analysis was used to assess construct validity. Item response theory was used to evaluate test reliability, describe item-level characteristics, identify local item dependence, and test for differential functioning by gender. 236 patients were included for analysis, with mean age 58 and approximately half female. Factor analysis failed to identify the lifestyle, coping, depression, and embarrassment domains, suggesting lack of construct validity. Items demonstrated low difficulty, indicating that the test has the highest reliability among individuals who have low quality of life. Five items are suggested for removal or replacement. Differential test functioning was minimal. This study has identified specific improvements that can be made to each domain of the Fecal Incontinence Quality of Life Scale and to the instrument overall. Formatting, scoring, and instructions may be simplified, and items with higher difficulty developed. The lifestyle domain can be used as is. The embarrassment domain should be significantly revised before use.
Mathysen, Danny G P; Aclimandos, Wagih; Roelant, Ella; Wouters, Kristien; Creuzot-Garcher, Catherine; Ringens, Peter J; Hawlina, Marko; Tassignon, Marie-José
To investigate whether introduction of item-response theory (IRT) analysis, in parallel to the 'traditional' statistical analysis methods available for performance evaluation of multiple T/F items as used in the European Board of Ophthalmology Diploma (EBOD) examination, has proved beneficial, and secondly, to study whether the overall assessment performance of the current written part of EBOD is sufficiently high (KR-20≥ 0.90) to be kept as examination format in future EBOD editions. 'Traditional' analysis methods for individual MCQ item performance comprise P-statistics, Rit-statistics and item discrimination, while overall reliability is evaluated through KR-20 for multiple T/F items. The additional set of statistical analysis methods for the evaluation of EBOD comprises mainly IRT analysis. These analysis techniques are used to monitor whether the introduction of negative marking for incorrect answers (since EBOD 2010) has a positive influence on the statistical performance of EBOD as a whole and its individual test items in particular. Item-response theory analysis demonstrated that item performance parameters should not be evaluated individually, but should be related to one another. Before the introduction of negative marking, the overall EBOD reliability (KR-20) was good though with room for improvement (EBOD 2008: 0.81; EBOD 2009: 0.78). After the introduction of negative marking, the overall reliability of EBOD improved significantly (EBOD 2010: 0.92; EBOD 2011:0.91; EBOD 2012: 0.91). Although many statistical performance parameters are available to evaluate individual items, our study demonstrates that the overall reliability assessment remains the only crucial parameter to be evaluated allowing comparison. While individual item performance analysis is worthwhile to undertake as secondary analysis, drawing final conclusions seems to be more difficult. Performance parameters need to be related, as shown by IRT analysis. Therefore, IRT analysis has
Vloedgraven, J.M.T.; Verhoeven, L.T.W.
In the present study, the nature of Dutch children's phonological awareness was examined throughout the elementary school grades. Phonological awareness was assessed using five different sets of items that measured rhyming, phoneme identification, phoneme blending, phoneme segmentation, and phoneme
Theoretically, increased levels of physical activity self-efficacy (PASE) should lead to increased physical activity, but few studies have reported this effect among youth. This failure may be at least partially attributable to measurement limitations. In this study, Item Response Modeling (IRM) was...
Sussman, Joshua; Beaujean, A. Alexander; Worrell, Frank C.; Watson, Stevie
Item response models (IRMs) were used to analyze Cross Racial Identity Scale (CRIS) scores. Rasch analysis scores were compared with classical test theory (CTT) scores. The partial credit model demonstrated a high goodness of fit and correlations between Rasch and CTT scores ranged from 0.91 to 0.99. CRIS scores are supported by both methods.…
Gagne, Phill; Furlow, Carolyn; Ross, Terris
In item response theory (IRT) simulation research, it is often necessary to use one software package for data generation and a second software package to conduct the IRT analysis. Because this can substantially slow down the simulation process, it is sometimes offered as a justification for using very few replications. This article provides…
van Rijn, Peter W.; Rijmen, Frank
Hooker and colleagues addressed a paradoxical situation that can arise in the application of multidimensional item response theory (MIRT) models to educational test data. We demonstrate that this MIRT paradox is an instance of the explaining-away phenomenon in Bayesian networks, and we attempt to enhance the understanding of MIRT models by placing…
Kim, Sooyeon; Moses, Tim; Yoo, Hanwook Henry
The purpose of this inquiry was to investigate the effectiveness of item response theory (IRT) proficiency estimators in terms of estimation bias and error under multistage testing (MST). We chose a 2-stage MST design in which 1 adaptation to the examinees' ability levels takes place. It includes 4 modules (1 at Stage 1, 3 at Stage 2) and 3 paths…
de Jong, M.G.; Pieters, R.; Stremersch, S.
Answers to sensitive questions are prone to social desirability bias. If not properly addressed, the validity of the research can be suspect. This article presents multigroup item randomized response theory (MIRRT) to measure self-reported sensitive topics across cultures. The method was
Klinkenberg, S.; Straatemeier, M.; van der Maas, H.L.J.
In this paper we present a model for computerized adaptive practice and monitoring. This model is used in the Maths Garden, a web-based monitoring system, which includes a challenging web environment for children to practice arithmetic. Using a new item response model based on the Elo (1978) rating
Tay, Louis; Drasgow, Fritz
Two Monte Carlo simulation studies investigated the effectiveness of the mean adjusted X[superscript 2]/df statistic proposed by Drasgow and colleagues and, because of problems with the method, a new approach for assessing the goodness of fit of an item response theory model was developed. It has been previously recommended that mean adjusted…
Vitterso, Joar; Biswas-Diener, Robert; Diener, Ed
Cultural differences in response to the Satisfaction With Life Scale (SWLS) items is investigated. Data were fit to a mixed Rasch model in order to identify latent classes of participants in a combined sample of Norwegians (N = 461) and Greenlanders (N = 180). Initial analyses showed no mean difference in life satisfaction between the two…
Villaseñor-Ovies, Pablo; Navarro-Zarza, José Eduardo; Saavedra, Miguel Ángel; Hernández-Díaz, Cristina; Canoso, Juan J; Biundo, Joseph J; Kalish, Robert A; de Toro Santos, Francisco Javier; McGonagle, Dennis; Carette, Simon; Alvarez-Nemegyei, José
This study aimed to identify the anatomical items of the upper extremity and spine that are potentially relevant to the practice of rheumatology. Ten rheumatologists interested in clinical anatomy who published, taught, and/or participated as active members of Clinical Anatomy Interest groups (six seniors, four juniors), participated in a one-round relevance Delphi exercise. An initial, 560-item list that included 45 (8.0 %) general concepts items; 138 (24.8 %) hand items; 100 (17.8 %) forearm and elbow items; 147 (26.2 %) shoulder items; and 130 (23.2 %) head, neck, and spine items was compiled by 5 of the participants. Each item was graded for importance with a Likert scale from 1 (not important) to 5 (very important). Thus, scores could range from 10 (1 × 10) to 50 (5 × 10). An item score of ≥40 was considered most relevant to competent practice as a rheumatologist. Mean item Likert scores ranged from 2.2 ± 0.5 to 4.6 ± 0.7. A total of 115 (20.5 %) of the 560 initial items reached relevance. Broken down by categories, this final relevant item list was composed by 7 (6.1 %) general concepts items; 32 (27.8 %) hand items; 20 (17.4 %) forearm and elbow items; 33 (28.7 %) shoulder items; and 23 (17.6 %) head, neck, and spine items. In this Delphi exercise, a group of practicing academic rheumatologists with an interest in clinical anatomy compiled a list of anatomical items that were deemed important to the practice of rheumatology. We suggest these items be considered curricular priorities when training rheumatology fellows in clinical anatomy skills and in programs of continuing rheumatology education.
Milanzi, Elasma; Molenberghs, Geert; Alonso, Ariel; Verbeke, Geert; De Boeck, Paul
For item response theory (IRT) models, which belong to the class of generalized linear or non-linear mixed models, reliability at the scale of observed scores (i.e., manifest correlation) is more difficult to calculate than latent correlation based reliability, but usually of greater scientific interest. This is not least because it cannot be calculated explicitly when the logit link is used in conjunction with normal random effects. As such, approximations such as Fisher's information coefficient, Cronbach's α, or the latent correlation are calculated, allegedly because it is easy to do so. Cronbach's α has well-known and serious drawbacks, Fisher's information is not meaningful under certain circumstances, and there is an important but often overlooked difference between latent and manifest correlations. Here, manifest correlation refers to correlation between observed scores, while latent correlation refers to correlation between scores at the latent (e.g., logit or probit) scale. Thus, using one in place of the other can lead to erroneous conclusions. Taylor series based reliability measures, which are based on manifest correlation functions, are derived and a careful comparison of reliability measures based on latent correlations, Fisher's information, and exact reliability is carried out. The latent correlations are virtually always considerably higher than their manifest counterparts, Fisher's information measure shows no coherent behaviour (it is even negative in some cases), while the newly introduced Taylor series based approximations reflect the exact reliability very closely. Comparisons among the various types of correlations, for various IRT models, are made using algebraic expressions, Monte Carlo simulations, and data analysis. Given the light computational burden and the performance of Taylor series based reliability measures, their use is recommended. © 2014 The British Psychological Society.
Open-ended (OE) items are widely used to gather data on student performance in international achievement studies. However, several factors may threaten validity when using such items. This study examined Finnish coders' opinions about threats to validity when coding responses to OE items in the PISA 2012 problem-solving test. A total of 6…
Cao, Yi; Lu, Ru; Tao, Wei
The local item independence assumption underlying traditional item response theory (IRT) models is often not met for tests composed of testlets. There are 3 major approaches to addressing this issue: (a) ignore the violation and use a dichotomous IRT model (e.g., the 2-parameter logistic [2PL] model), (b) combine the interdependent items to form a…
Stevenson, Claire E.; Heiser, Willem J.; Resing, Wilma C. M.
Multiple-choice (MC) analogy items are often used in cognitive assessment. However, in dynamic testing, where the aim is to provide insight into potential for learning and the learning process, constructed-response (CR) items may be of benefit. This study investigated whether training with CR or MC items leads to differences in the strategy…
Shannon, David M.; Bradshaw, Carol C.
Compared response rates, response time, and costs of mail and electronic surveys using a sample of 377 college faculty members. Mail surveys yielded a higher response rate and a lower rate of undeliverable surveys, but response time was longer and costs were higher than for electronic surveys. (SLD)
Lee, Bokim; Jung, Hye-Sun
The researchers conducted a cross-sectional survey to determine the relationship between handling heavy items during pregnancy and spontaneous abortion among working women in South Korea. One thousand working women were selected from a database of those eligible for maternity benefits under the National Employment Insurance Plan. Study results showed that handling heavy items during pregnancy was associated with an increased risk of spontaneous abortion after adjusting for general characteristics of the participants and their work environment. A collective effort is needed on the parts of employers, employees, occupational health nurses, and the government to protect working women from lifting heavy items while pregnant. Copyright 2012, SLACK Incorporated.
Montazeri, Ali; Vahdaninia, Mariam; Mousavi, Sayed Javad; Omidvari, Speideh
The 12-item Short Form Health Survey (SF-12) as a shorter alternative of the SF-36 is largely used in health outcomes surveys. The aim of this study was to validate the SF-12 in Iran. A random sample of the general population aged 15 years and over living in Tehran, Iran completed the SF-12. Reliability was estimated using internal consistency and validity was assessed using known groups comparison and convergent validity. In addition, the factor structure of the questionnaire was extracted by performing both exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). In all, 5587 individuals were studied (2721 male and 2866 female). The mean age and formal education of the respondents were 35.1 (SD = 15.4) and 10.2 (SD = 4.4) years respectively. The results showed satisfactory internal consistency for both summary measures, that are the Physical Component Summary (PCS) and the Mental Component Summary (MCS); Cronbach's alpha for PCS-12 and MCS-12 was 0.73 and 0.72, respectively. Known-groups comparison showed that the SF-12 discriminated well between men and women and those who differed in age and educational status (P < 0.001). In addition, correlations between the SF-12 scales and single items showed that the physical functioning, role physical, bodily pain and general health subscales correlated higher with the PCS-12 score, while the vitality, social functioning, role emotional and mental health subscales more correlated with the MCS-12 score lending support to its good convergent validity. Finally the principal component analysis indicated a two-factor structure (physical and mental health) that jointly accounted for 57.8% of the variance. The confirmatory factory analysis also indicated a good fit to the data for the two-latent structure (physical and mental health). In general the findings suggest that the SF-12 is a reliable and valid measure of health related quality of life among Iranian population. However, further studies are needed to
Shaw, Amanda M; Rogge, Ronald D
This study took a critical look at the construct of sexual quality. The 65 items of four well-validated self-report measures of sexual satisfaction (the Index of Sexual Satisfaction [ISS], Hudson, Harrison, & Crosscup, 1981; the Global Measure of Sexual Satisfaction [GMSEX], Lawrance & Byers, 1995; the Pinney Sexual Satisfaction Inventory [PSSI], Pinney, Gerrard, & Denney, 1987; the Young Sexual Satisfaction Scale [YSSS], Young, Denny, Luquis, & Young, 1998) and an additional 74 potential sexual quality items were given to 3060 online participants. Using Item Response Theory (IRT), we demonstrated that the ISS, YSSS, and PSSI scales provided suboptimal levels of precision in assessing sexual quality, particularly given the length of those scales. Exploratory factor analyses, IRT, differential item functioning analyses, and longitudinal responsiveness analyses were used to develop and evaluate the Quality of Sex Inventory. Results suggested that, in comparison to existing scales, the QSI (1) offers investigators and clinicians more theoretically focused scales, (2) distinguishes sexual satisfaction from sexual dissatisfaction, and (3) offers greater precision and power for detecting differences with (4) comparably high levels of responsiveness for detecting change over time despite being notably shorter than most of the existing scales. The QSI-satisfaction subscales demonstrated strong convergent validity with other measures of sexual satisfaction and excellent construct validity with anchor scales from the nomological net surrounding that construct, suggesting that they continue to assess the same theoretical construct as prior scales. Implications for research are discussed.
Mazefsky, Carla A; Yu, Lan; White, Susan W; Siegel, Matthew; Pilkonis, Paul A
Individuals with autism spectrum disorder (ASD) often present with prominent emotion dysregulation that requires treatment but can be difficult to measure. The Emotion Dysregulation Inventory (EDI) was created using methods developed by the Patient-Reported Outcomes Measurement Information System (PROMIS ® ) to capture observable indicators of poor emotion regulation. Caregivers of 1,755 youth with ASD completed 66 candidate EDI items, and the final 30 items were selected based on classical test theory and item response theory (IRT) analyses. The analyses identified two factors: (a) Reactivity, characterized by intense, rapidly escalating, sustained, and poorly regulated negative emotional reactions, and (b) Dysphoria, characterized by anhedonia, sadness, and nervousness. The final items did not show differential item functioning (DIF) based on gender, age, intellectual ability, or verbal ability. Because the final items were calibrated using IRT, even a small number of items offers high precision, minimizing respondent burden. IRT co-calibration of the EDI with related measures demonstrated its superiority in assessing the severity of emotion dysregulation with as few as seven items. Validity of the EDI was supported by expert review, its association with related constructs (e.g., anxiety and depression symptoms, aggression), higher scores in psychiatric inpatients with ASD compared to a community ASD sample, and demonstration of test-retest stability and sensitivity to change. In sum, the EDI provides an efficient and sensitive method to measure emotion dysregulation for clinical assessment, monitoring, and research in youth with ASD of any level of cognitive or verbal ability. Autism Res 2018. © 2018 International Society for Autism Research, Wiley Periodicals, Inc. This paper describes a new measure of poor emotional control called the Emotion Dysregulation Inventory (EDI). Caregivers of 1,755 youth with ASD completed candidate items, and advanced statistical
Wellman, Robert J; Edelen, Maria Orlando; DiFranza, Joseph R
The Autonomy over Tobacco Scale (AUTOS) is composed of 12-symptoms of nicotine dependence. While it has demonstrated excellent reliability and validity, several psychometric properties have yet to be investigated. We aimed to determine (1) whether items functioned differently across demographic groups, (2) the likelihood that individual symptoms would be endorsed by smokers at different levels of diminished autonomy, and (3) the degree of information provided by each item and the reliability of the full AUTOS across the range of diminished autonomy. Data for this study come from two convenience samples of American adult current smokers (n=777; 69% female; 88% white; Mage=34 years, range: 18-78), of whom 66% were daily smokers (Mcigarettes/smoking day=10.1, range: AUTOS online as part of "a research study about the experiences people have when they smoke." After p value correction, items remained invariant across sex and minority status, while two items functioned differently according to age, with minimal impact on the total AUTOS score. Discriminative power of the items was high. The greatest amount of information is provided at just under one-half SD above the mean and the least at the extremes of diminished autonomy. The AUTOS maintains acceptable reliability (>0.70) across the range of diminished autonomy within which more than 95% of smokers' scores could be anticipated to fall. The AUTOS is a versatile and psychometrically sound instrument for measuring the loss of autonomy over tobacco use. Copyright © 2015 Elsevier Ltd. All rights reserved.
Marino, Molly E; Dore, Emily C; Ni, Pengsheng; Ryan, Colleen M; Schneider, Jeffrey C; Acton, Amy; Jette, Alan M; Kazis, Lewis E
To develop self-reported short forms for the Life Impact Burn Recovery Evaluation (LIBRE) Profile. Short forms based on the item parameters of discrimination and average difficulty. A support network for burn survivors, peer support networks, social media, and mailings. Burn survivors (N=601) older than 18 years. Not applicable. The LIBRE Profile. Ten-item short forms were developed to cover the 6 LIBRE Profile scales: Relationships with Family & Friends, Social Interactions, Social Activities, Work & Employment, Romantic Relationships, and Sexual Relationships. Ceiling effects were ≤15% for all scales; floor effects were item bank, computerized adaptive test, and short forms are all scored along the same metric, and therefore scores are comparable regardless of the mode of administration. Copyright © 2017 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Brandão, Tânia; Schulz, Marc S; Gross, James J; Matos, Paula Mena
Emotion regulation is thought to play an important role in adaptation to cancer. However, the emotion regulation questionnaire (ERQ), a widely used instrument to assess emotion regulation, has not yet been validated in this context. This study addresses this gap by examining the psychometric properties of the ERQ in a sample of Portuguese women with cancer. The ERQ was administered to 204 women with cancer (mean age = 48.89 years, SD = 7.55). Confirmatory factor analysis and item response theory analysis were used to examine psychometric properties of the ERQ. Confirmatory factor analysis confirmed the 2-factor solution proposed by the original authors (expressive suppression and cognitive reappraisal). This solution was invariant across age and type of cancer. Item response theory analyses showed that all items were moderately to highly discriminant and that items are better suited for identifying moderate levels of expressive suppression and cognitive reappraisal. Support was found for the internal consistency and test-retest reliability of the ERQ. The pattern of relationships with emotional control, alexithymia, emotional self-efficacy, attachment, and quality of life provided evidence of the convergent and concurrent validity for both dimensions of the ERQ. Overall, the ERQ is a psychometrically sound approach for assessing emotion regulation strategies in the oncological context. Clinical implications are discussed. Copyright © 2016 John Wiley & Sons, Ltd.
Emons, Wilco H M; Meijer, Rob R; Denollet, Johan
Individuals with increased levels of both negative affectivity (NA) and social inhibition (SI)-referred to as type-D personality-are at increased risk of adverse cardiac events. We used item response theory (IRT) to evaluate NA, SI, and type-D personality as measured by the DS14. The objectives of this study were (a) to evaluate the relative contribution of individual items to the measurement precision at the cutoff to distinguish type-D from non-type-D personality and (b) to investigate the comparability of NA, SI, and type-D constructs across the general population and clinical populations. Data from representative samples including 1316 respondents from the general population, 427 respondents diagnosed with coronary heart disease, and 732 persons suffering from hypertension were analyzed using the graded response IRT model. In Study 1, the information functions obtained in the IRT analysis showed that (a) all items had highest measurement precision around the cutoff and (b) items are most informative at the higher end of the scale. In Study 2, the IRT analysis showed that measurements were fairly comparable across the general population and clinical populations. The DS14 adequately measures NA and SI, with highest reliability in the trait range around the cutoff. The DS14 is a valid instrument to assess and compare type-D personality across clinical groups.
Saltychev, Mikhail; Mattie, Ryan; McCormick, Zachary; Laimi, Katri
The Neck Disability Index (NDI) is commonly used for clinical and research assessment for chronic neck pain, yet the original version of this tool has not undergone significant validity testing, and in particular, there has been minimal assessment using Item Response Theory. The goal of the present study was to investigate the psychometric properties of the original version of the NDI in a large sample of individuals with chronic neck pain by defining its internal consistency, construct structure and validity, and its ability to discriminate between different degrees of functional limitation. This is a cross-sectional cohort study of 585 consecutive patients with chronic neck pain seen in a university hospital rehabilitation clinic. Internal consistency was evaluated using Cronbach's alpha, construct structure was evaluated by exploratory factor analysis, and discrimination ability was determined by Item Response Theory. The NDI demonstrated good internal consistency assessed by Cronbach's alpha (0.87). The exploratory factor analysis identified only one factor with eigenvalue considered significant (cutoff 1.0). When analyzed by Item Response Theory, eight out of 10 items demonstrated almost ideal difficulty parameter estimates. In addition, eight out of 10 items showed high to perfect estimates of discrimination ability (overall range 0.8 to 2.9). Amongst patients with chronic neck pain, the NDI was found to have good internal consistency, have unidimensional properties, and an excellent ability to distinguish patients with different levels of perceived disability. Implications for Rehabilitation The Neck Disability Index has good internal consistency, unidimensional properties, and an excellent ability to distinguish patients with different levels of perceived disability. The Neck Disability Index is recommended for use when selecting patients for rehabilitation, setting rehabilitation goals, and measuring the outcome of intervention.
Polak, Maaike Geertruida
The thesis explains the fundamental difference between unipolar and bipolar measurement scales for psychological characteristics. We explore the use of correspondence analysis (CA), a technique that is similar to principal component analysis and is available in SAS and SPSS, to select items that
van der Linden, Willem J.; Vos, Hendrik J.; Chang, Lei
In judgmental standard setting experiments, it may be difficult to specify subjective probabilities that adequately take the properties of the items into account. As a result, these probabilities are not consistent with each other in the sense that they do not refer to the same borderline level of
Eleje, Lydia I.; Esomonu, Nkechi P. M.
A Test to measure achievement in quantitative economics among secondary school students was developed and validated in this study. The test is made up 20 multiple choice test items constructed based on quantitative economics sub-skills. Six research questions guided the study. Preliminary validation was done by two experienced teachers in…
Liu, Ou Lydia; Brew, Chris; Blackmore, John; Gerard, Libby; Madhok, Jacquie; Linn, Marcia C.
Content-based automated scoring has been applied in a variety of science domains. However, many prior applications involved simplified scoring rubrics without considering rubrics representing multiple levels of understanding. This study tested a concept-based scoring tool for content-based scoring, c-rater™, for four science items with rubrics…
Danielle Ramos de Miranda Pereira
Full Text Available A constatação da ampla utilização de escalas multidimensionais por parte dos pesquisadores da área de marketing motivou a elaboração de um artigo com o propósito de discutir a aplicação da Teoria da Resposta ao Item (TRI, bem como apresentar a essa área um método que tem se mostrado bastante eficaz na estimação de construtos comportamentais. Sendo assim, o artigo apresenta uma discussão sobre a TRI, ressaltando seus avanços em relação à Teoria Clássica do Teste (TCT e suas aplicações tradicionais no campo da psicometria e da avaliação educacional. Para verificar sua aplicabilidade nos estudos de marketing, julgou-se adequado conduzir uma aplicação prática da TRI em um estudo envolvendo uma escala já bastante utilizada pelos pesquisadores - a de orientação de mercado (Escala MkTor proposta por Narver e Slater (1990. Os resultados da aplicação demonstraram que, embora o modelo da TRI proposto possa ser considerado satisfatório para a aplicação no contexto da Orientação para o Mercado, existem muitos desafios a serem enfrentados por novos estudos como a construção de uma escala com interpretação prática, indicando o que significa para uma empresa possuir um nível de maturidade associado a um determinado construto. As considerações finais ressaltam que a grande contribuição do artigo aos estudos em marketing é a apresentação de um método alternativo para estimar de forma mais apurada os construtos e avaliar a qualidade dos itens das escalas.The widespread utilization of multidimensional scales by researchers in field of marketing have motivated the conduction of a study to discuss the application of the Item Response Theory (IRT as well as presenting a method that has proved very effective in the estimation of behavioral constructs. Therefore, this article presents a discussion about IRT highlighting its advances regarding the Classical Theory of Tests (CTT and its traditional applications in the
Guenole, Nigel; Brown, Anna A; Cooper, Andrew J
This article describes an investigation of whether Thurstonian item response modeling is a viable method for assessment of maladaptive traits. Forced-choice responses from 420 working adults to a broad-range personality inventory assessing six maladaptive traits were considered. The Thurstonian item response model's fit to the forced-choice data was adequate, while the fit of a counterpart item response model to responses to the same items but arranged in a single-stimulus design was poor. Monotrait heteromethod correlations indicated corresponding traits in the two formats overlapped substantially, although they did not measure equivalent constructs. A better goodness of fit and higher factor loadings for the Thurstonian item response model, coupled with a clearer conceptual alignment to the theoretical trait definitions, suggested that the single-stimulus item responses were influenced by biases that the independent clusters measurement model did not account for. Researchers may wish to consider forced-choice designs and appropriate item response modeling techniques such as Thurstonian item response modeling for personality questionnaire applications in industrial psychology, especially when assessing maladaptive traits. We recommend further investigation of this approach in actual selection situations and with different assessment instruments.
Standard Errors for B1 Bell-shaped distribution Rectangular Item b Bn-45 n=90 n-45 n=45 -No. i i N-1500 N=1500 N-6000 N=1500 1 -2.01 -1.75 0.516 0.466...34th Streets Lawrence, KS 66045 Baltimore, MD 21218 ENIC Facility-Acquisitions 1 Dr. Ron Hambleton 4t33 Rugby Avenue School of Education Lcthesda, !ID
Aaro, Leif E.; Breivik, Kyrre; Klepp, Knut-Inge; Kaaya, Sylvia; Onya, Hans E.; Wubs, Annegreet; Helleve, Arnfinn; Flisher, Alan J.
A 14-item human immunodeficiency virus/acquired immunodeficiency syndrome knowledge scale was used among school students in 80 schools in 3 sites in Sub-Saharan Africa (Cape Town and Mankweng, South Africa, and Dar es Salaam, Tanzania). For each item, an incorrect or don't know response was coded as 0 and correct response as 1. Exploratory factor…
Aderka, Idan M; Pollack, Mark H; Simon, Naomi M; Smits, Jasper A J; Van Ameringen, Michael; Stein, Murray B; Hofmann, Stefan G
The Social Phobia Inventory (SPIN) is a widely used measure in mental health settings and a 3-item version (mini-SPIN) has been developed as a screening instrument for social anxiety disorder. In the present study, we examined the psychometric properties of the SPIN and developed a brief version (mini-SPIN-R) designed to assess social anxiety severity using item response theory. Our sample included 569 individuals with social anxiety disorder who participated in 2 clinical trials and filled out a battery of self-report measures. Using a nonparametric kernel smoothing method we identified the most sensitive items of the SPIN. These 3 items comprised the mini-SPIN-R, which was found to have greater internal consistency, and to capture a greater range of symptoms compared to the mini-SPIN. The mini-SPIN-R evidenced superior convergent validity compared to the mini-SPIN and both measures had similar divergent validity. Thus, the mini-SPIN-R is a promising brief measure of social anxiety severity. Copyright © 2013. Published by Elsevier Ltd.
Y.S. Liem (Ylian Serina); J.L. Bosch (Johanna); L.R. Arends (Lidia); M.H. Heijenbrok-Kal (Majanka); M.G.M. Hunink (Myriam)
textabstractObjectives: The Medical Outcomes Study Short Form 36-Item Health Survey (SF-36) is the most widely used generic instrument to estimate quality of life of patients on renal replacement therapy. Purpose of this study was to summarize and compare the published literature on quality of
Kapuza, A. V.; Tyumeneva, Yu. A.
One of the ways of controlling for the influence of social expectations on the answers given by survey respondents is to use a social desirability scale together with the main questions. The social desirability scale, which was included in the Teaching and Learning International Survey (TALIS) international comparative study for this purpose, was…
Hence the discussion is illustrated with real examples from surveys (in particular 2003 KDHS) conducted by Central Bureau of Statistics (CBS) - Kenya. Some suggestions are made for improving the quality of non-response weighting. Keywords: Survey non-response; non-response adjustment factors; weighting; sampling ...
substituting (15) into (16) and solving for k and K k = b b1 - o K , (17)k where b and b are means for m and r items, respectively. To find the variance...C5 , and C12 were treated as known. We find that the standard errors of B1 to B5 are increased drastically by ignorance of C 1 to C5 ; all...ERIC Facilltv-Acquisitlons Davie Hall 013A 4833 Rugby Avenue Chapel Hill, NC 27514 Bethesda, MD 20014 -7- Dr. A. J. Eschenbrenner 1 Dr. John R
Full Text Available Abstract Background Theoretically, increased levels of physical activity self-efficacy (PASE should lead to increased physical activity, but few studies have reported this effect among youth. This failure may be at least partially attributable to measurement limitations. In this study, Item Response Modeling (IRM was used to develop new physical activity and sedentary behavior change self-efficacy scales. The validity of the new scales was compared with accelerometer assessments of physical activity and sedentary behavior. Methods New PASE and sedentary behavior change (TV viewing, computer video game use, and telephone use self-efficacy items were developed. The scales were completed by 714, 6th grade students in seven US cities. A limited number of participants (83 also wore an accelerometer for five days and provided at least 3 full days of complete data. The new scales were analyzed using Classical Test Theory (CTT and IRM; a reduced set of items was produced with IRM and correlated with accelerometer counts per minute and minutes of sedentary, light and moderate to vigorous activity per day after school. Results The PASE items discriminated between high and low levels of PASE. Full and reduced scales were weakly correlated (r = 0.18 with accelerometer counts per minute after school for boys, with comparable associations for girls. Weaker correlations were observed between PASE and minutes of moderate to vigorous activity (r = 0.09 – 0.11. The uni-dimensionality of the sedentary scales was established by both exploratory factor analysis and the fit of items to the underlying variable and reliability was assessed across the length of the underlying variable with some limitations. The reduced sedentary behavior scales had poor reliability. The full scales were moderately correlated with light intensity physical activity after school (r = 0.17 to 0.33 and sedentary behavior (r = -0.29 to -0.12 among the boys, but not for girls. Conclusion New
Peipert, John D; Bentler, Peter M; Klicko, Kristi; Hays, Ron D
The Centers for Medicare & Medicaid Services require that dialysis patients' health-related quality of life be assessed annually. The primary instrument used for this purpose is the Kidney Disease Quality of Life 36-Item Short-Form Survey (KDQOL-36), which includes the SF-12 as its generic core and 3 kidney disease-targeted scales: Burden of Kidney Disease, Symptoms and Problems of Kidney Disease, and Effects of Kidney Disease. Despite its broad use, there has been limited evaluation of KDQOL-36's psychometric properties. Secondary analyses of data collected by the Medical Education Institute to evaluate the reliability and factor structure of the KDQOL-36 scales. KDQOL-36 responses from 70,786 dialysis patients in 1,381 US dialysis facilities that permitted data analysis were collected from June 1, 2015, through May 31, 2016, as part of routine clinical assessment. We assessed the KDQOL-36 scales' internal consistency reliability and dialysis facility-level reliability using coefficient alpha and 1-way analysis of variance. We evaluated the KDQOL-36's factor structure using item-to-total scale correlations and confirmatory factor analysis. Construct validity was examined using correlations between SF-12 and KDQOL-36 scales and "known groups" analyses. Each of the KDQOL-36's kidney disease-targeted scales had acceptable internal consistency reliability (α=0.83-0.85) and facility-level reliability (r=0.75-0.83). Item-scale correlations and a confirmatory factor analysis model evidenced the KDQOL-36's original factor structure. Construct validity was supported by large correlations between the SF-12 Physical Component Summary and Mental Component Summary (r=0.40-0.52) and the KDQOL-36 scale scores, as well as significant differences on the scale scores between patients receiving different types of dialysis, diabetic and nondiabetic patients, and patients who were employed full-time versus not. Use of secondary data from a clinical registry. The study provides
Khan, Mohammad K A; Faught, Erin L; Chu, Yen Li; Ekwaru, John P; Storey, Kate E; Veugelers, Paul J
Both diet quality and sleep duration of children have declined in the past decades. Several studies have suggested that diet and sleep are associated; however, it is not established which aspects of the diet are responsible for this association. Is it nutrients, food items, diet quality or eating behaviours? We surveyed 2261 grade 5 children on their dietary intake and eating behaviours, and their parents on their sleep duration and sleep quality. We performed factor analysis to identify and quantify the essential factors among 57 nutrients, 132 food items and 19 eating behaviours. We considered these essential factors along with a diet quality score in multivariate regression analyses to assess their independent associations with sleep. Nutrients, food items and diet quality did not exhibit independent associations with sleep, whereas two groupings of eating behaviours did. 'Unhealthy eating habits and environments' was independently associated with sleep. For each standard deviation increase in their factor score, children had 6 min less sleep and were 12% less likely to have sleep of good quality. 'Snacking between meals and after supper' was independently associated with sleep quality. For each standard deviation increase in its factor score, children were 7% less likely to have good quality sleep. This study demonstrates that eating behaviours are responsible for the associations of diet with sleep among children. Health promotion programmes aiming to improve sleep should therefore focus on discouraging eating behaviours such as eating alone or in front of the TV, and snacking between meals and after supper. © 2016 European Sleep Research Society.
Cordier, Reinie; Speyer, Renée; Schindler, Antonio; Michou, Emilia; Heijnen, Bas Joris; Baijens, Laura; Karaduman, Ayşe; Swan, Katina; Clavé, Pere; Joosten, Annette Veronica
The Swallowing Quality of Life questionnaire (SWAL-QOL) is widely used clinically and in research to evaluate quality of life related to swallowing difficulties. It has been described as a valid and reliable tool, but was developed and tested using classic test theory. This study describes the reliability and validity of the SWAL-QOL using item response theory (IRT; Rasch analysis). SWAL-QOL data were gathered from 507 participants at risk of oropharyngeal dysphagia (OD) across four European countries. OD was confirmed in 75.7% of participants via videofluoroscopy and/or fiberoptic endoscopic evaluation, or a clinical diagnosis based on meeting selected criteria. Patients with esophageal dysphagia were excluded. Data were analysed using Rasch analysis. Item and person reliability was good for all the items combined. However, person reliability was poor for 8 subscales and item reliability was poor for one subscale. Eight subscales exhibited poor person separation and two exhibited poor item separation. Overall item and person fit statistics were acceptable. However, at an individual item fit level results indicated unpredictable item responses for 28 items, and item redundancy for 10 items. The item-person dimensionality map confirmed these findings. Results from the overall Rasch model fit and Principal Component Analysis were suggestive of a second dimension. For all the items combined, none of the item categories were 'category', 'threshold' or 'step' disordered; however, all subscales demonstrated category disordered functioning. Findings suggest an urgent need to further investigate the underlying structure of the SWAL-QOL and its psychometric characteristics using IRT.
Park, Jong Cook; Kim, Kwang Sig
The reliability of test is determined by each items' characteristics. Item analysis is achieved by classical test theory and item response theory. The purpose of the study was to compare the discrimination indices with item response theory using the Rasch model. Thirty-one 4th-year medical school students participated in the clinical course written examination, which included 22 A-type items and 3 R-type items. Point biserial correlation coefficient (C(pbs)) was compared to method of extreme group (D), biserial correlation coefficient (C(bs)), item-total correlation coefficient (C(it)), and corrected item-total correlation coeffcient (C(cit)). Rasch model was applied to estimate item difficulty and examinee's ability and to calculate item fit statistics using joint maximum likelihood. Explanatory power (r2) of Cpbs is decreased in the following order: C(cit) (1.00), C(it) (0.99), C(bs) (0.94), and D (0.45). The ranges of difficulty logit and standard error and ability logit and standard error were -0.82 to 0.80 and 0.37 to 0.76, -3.69 to 3.19 and 0.45 to 1.03, respectively. Item 9 and 23 have outfit > or =1.3. Student 1, 5, 7, 18, 26, 30, and 32 have fit > or =1.3. C(pbs), C(cit), and C(it) are good discrimination parameters. Rasch model can estimate item difficulty parameter and examinee's ability parameter with standard error. The fit statistics can identify bad items and unpredictable examinee's responses.
Cohn, Amy M.; Hagman, Brett T.; Graff, Fiona S.; Noel, Nora E.
Objective: The present study examined the latent continuum of alcohol-related negative consequences among first-year college women using methods from item response theory and classical test theory. Method: Participants (N = 315) were college women in their freshman year who reported consuming any alcohol in the past 90 days and who completed assessments of alcohol consumption and alcohol-related negative consequences using the Rutgers Alcohol Problem Index. Results: Item response theory analyses showed poor model fit for five items identified in the Rutgers Alcohol Problem Index. Two-parameter item response theory logistic models were applied to the remaining 18 items to examine estimates of item difficulty (i.e., severity) and discrimination parameters. The item difficulty parameters ranged from 0.591 to 2.031, and the discrimination parameters ranged from 0.321 to 2.371. Classical test theory analyses indicated that the omission of the five misfit items did not significantly alter the psychometric properties of the construct. Conclusions: Findings suggest that those consequences that had greater severity and discrimination parameters may be used as screening items to identify female problem drinkers at risk for an alcohol use disorder. PMID:22051212
Walters, Glenn D
An item response theory (IRT) analysis of the Psychological Inventory of Criminal Thinking Styles (PICTS) was performed on 26,831 (19,067 male and 7,764 female) federal probationers and compared with results obtained on 3,266 (3,039 male and 227 female) prisoners from previous research. Despite the fact male and female federal probationers scored significantly lower on the PICTS thinking style scales than male and female prisoners, discrimination and location parameter estimates for the individual PICTS items were comparable across sex and setting. Consistent with the results of a previous IRT analysis conducted on the PICTS, the current results did not support sentimentality as a component of general criminal thinking. Findings from this study indicate that the discriminative power of the individual PICTS items is relatively stable across sex (male, female) and correctional setting (probation, prison) and that the PICTS may be measuring the same criminal thinking construct in male and female probationers and prisoners. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Background Nonparametric item response theory (IRT) was used to examine (a) the performance of the 30 Positive and Negative Syndrome Scale (PANSS) items and their options ((levels of severity), (b) the effectiveness of various subscales to discriminate among differences in symptom severity, and (c) the development of an abbreviated PANSS (Mini-PANSS) based on IRT and a method to link scores to the original PANSS. Methods Baseline PANSS scores from 7,187 patients with Schizophrenia or Schizoaffective disorder who were enrolled between 1995 and 2005 in psychopharmacology trials were obtained. Option characteristic curves (OCCs) and Item Characteristic Curves (ICCs) were constructed to examine the probability of rating each of seven options within each of 30 PANSS items as a function of subscale severity, and summed-score linking was applied to items selected for the Mini-PANSS. Results The majority of items forming the Positive and Negative subscales (i.e. 19 items) performed very well and discriminate better along symptom severity compared to the General Psychopathology subscale. Six of the seven Positive Symptom items, six of the seven Negative Symptom items, and seven out of the 16 General Psychopathology items were retained for inclusion in the Mini-PANSS. Summed score linking and linear interpolation was able to produce a translation table for comparing total subscale scores of the Mini-PANSS to total subscale scores on the original PANSS. Results show scores on the subscales of the Mini-PANSS can be linked to scores on the original PANSS subscales, with very little bias. Conclusions The study demonstrated the utility of non-parametric IRT in examining the item properties of the PANSS and to allow selection of items for an abbreviated PANSS scale. The comparisons between the 30-item PANSS and the Mini-PANSS revealed that the shorter version is comparable to the 30-item PANSS, but when applying IRT, the Mini-PANSS is also a good indicator of illness severity
Khan, Anzalee; Lewis, Charles; Lindenmayer, Jean-Pierre
Nonparametric item response theory (IRT) was used to examine (a) the performance of the 30 Positive and Negative Syndrome Scale (PANSS) items and their options ((levels of severity), (b) the effectiveness of various subscales to discriminate among differences in symptom severity, and (c) the development of an abbreviated PANSS (Mini-PANSS) based on IRT and a method to link scores to the original PANSS. Baseline PANSS scores from 7,187 patients with Schizophrenia or Schizoaffective disorder who were enrolled between 1995 and 2005 in psychopharmacology trials were obtained. Option characteristic curves (OCCs) and Item Characteristic Curves (ICCs) were constructed to examine the probability of rating each of seven options within each of 30 PANSS items as a function of subscale severity, and summed-score linking was applied to items selected for the Mini-PANSS. The majority of items forming the Positive and Negative subscales (i.e. 19 items) performed very well and discriminate better along symptom severity compared to the General Psychopathology subscale. Six of the seven Positive Symptom items, six of the seven Negative Symptom items, and seven out of the 16 General Psychopathology items were retained for inclusion in the Mini-PANSS. Summed score linking and linear interpolation was able to produce a translation table for comparing total subscale scores of the Mini-PANSS to total subscale scores on the original PANSS. Results show scores on the subscales of the Mini-PANSS can be linked to scores on the original PANSS subscales, with very little bias. The study demonstrated the utility of non-parametric IRT in examining the item properties of the PANSS and to allow selection of items for an abbreviated PANSS scale. The comparisons between the 30-item PANSS and the Mini-PANSS revealed that the shorter version is comparable to the 30-item PANSS, but when applying IRT, the Mini-PANSS is also a good indicator of illness severity.
Stevenson, C.E.; Heiser, W.J.; Resing, W.C.M.
Multiple-choice (MC) analogy items are often used in cognitive assessment. However, in dynamic testing, where the aim is to provide insight into potential for learning and the learning process, constructed-response (CR) items may be of benefit. This study investigated whether training with CR or MC
Brackenbury, Tim; Zickar, Michael J.; Munson, Benjamin; Storkel, Holly L.
Purpose: Item response theory (IRT) is a psychometric approach to measurement that uses latent trait abilities (e.g., speech sound production skills) to model performance on individual items that vary by difficulty and discrimination. An IRT analysis was applied to preschoolers' productions of the words on the Goldman-Fristoe Test of…
Sueiro, Manuel J.; Abad, Francisco J.
The distance between nonparametric and parametric item characteristic curves has been proposed as an index of goodness of fit in item response theory in the form of a root integrated squared error index. This article proposes to use the posterior distribution of the latent trait as the nonparametric model and compares the performance of an index…
Hoertel, Nicolas; Blanco, Carlos; Peyre, Hugo; Wall, Melanie M; McMahon, Kibby; Gorwood, Philip; Lemogne, Cédric; Limosin, Frédéric
The inclusion of subsyndromal forms of bipolarity in the fifth edition of the DSM has major implications for the way in which we approach the diagnosis of individuals with depressive symptoms. The aim of the present study was to use methods based on item response theory (IRT) to examine whether, when equating for levels of depression severity, there are differences in the likelihood of reporting DSM-IV symptoms of major depressive episode (MDE) between subjects with and without a lifetime history of manic symptoms. We conducted these analyses using a large, nationally representative sample from the USA (n=34,653), the second wave of the National Epidemiologic Survey on Alcohol and Related Conditions. The items sadness, appetite disturbance and psychomotor symptoms were better indicators of depression severity in participants without a lifetime history of manic symptoms, in a clinically meaningful way. DSM-IV symptoms of MDE were substantially less informative in participants with a lifetime history of manic symptoms than in those without such history. Clinical information on DSM-IV depressive and manic symptoms was based on retrospective self-report The clinical presentation of depressive symptoms may substantially differ in individuals with and without a lifetime history of manic symptoms. These findings alert to the possibility of atypical symptomatic presentations among individuals with co-occurring symptoms or disorders and highlight the importance of continued research into specific pathophysiology differentiating unipolar and bipolar depression. Copyright © 2016 Elsevier B.V. All rights reserved.
Full Text Available Occupation is key in socioeconomic research. As in other survey modes, most web surveys use an open-ended question for occupation, though the absence of interviewers elicits unidentifiable or aggregated responses. Unlike other modes, web surveys can use a search tree with an occupation database. They are hardly ever used, but this may change due to technical advancements. This article evaluates a three-step search tree with 1,700 occupational titles, used in the 2010 multilingual WageIndicator web survey for UK, Belgium and Netherlands (22,990 observations. Dropout rates are high; in Step 1 due to unemployed respondents judging the question not to be adequate, and in Step 3 due to search tree item length. Median response times are substantial due to search tree item length, dropout in the next step and invalid occupations ticked. Overall the validity of the occupation data is rather good, 1.7-7.5% of the respondents completing the search tree have ticked an invalid occupation.
Arnulf, Jan Ketil; Larsen, Kai Rune; Martinsen, Øyvind Lund; Egeland, Thore
The traditional understanding of data from Likert scales is that the quantifications involved result from measures of attitude strength. Applying a recently proposed semantic theory of survey response, we claim that survey responses tap two different sources: a mixture of attitudes plus the semantic structure of the survey. Exploring the degree to which individual responses are influenced by semantics, we hypothesized that in many cases, information about attitude strength is actually filtered out as noise in the commonly used correlation matrix. We developed a procedure to separate the semantic influence from attitude strength in individual response patterns, and compared these results to, respectively, the observed sample correlation matrices and the semantic similarity structures arising from text analysis algorithms. This was done with four datasets, comprising a total of 7,787 subjects and 27,461,502 observed item pair responses. As we argued, attitude strength seemed to account for much information about the individual respondents. However, this information did not seem to carry over into the observed sample correlation matrices, which instead converged around the semantic structures offered by the survey items. This is potentially disturbing for the traditional understanding of what survey data represent. We argue that this approach contributes to a better understanding of the cognitive processes involved in survey responses. In turn, this could help us make better use of the data that such methods provide.
Todd, Angela L; Porter, Maree; Williamson, Jennifer L; Patterson, Jillian A; Roberts, Christine L
Surveys are commonly used in health research to assess patient satisfaction with hospital care. Achieving an adequate response rate, in the face of declining trends over time, threatens the quality and reliability of survey results. This paper evaluates a strategy to increase the response rate in a postal satisfaction survey with women who had recently given birth. A sample of 2048 Australian women who had recently given birth at seven maternity units in New South Wales were invited to participate in a postal survey about their recent experiences with maternity care. The study design included a randomised controlled trial that tested two types of pre-notification letter (with or without the option of opting out of the survey). The study also explored the acceptability of a request for consent to link survey data with existing routinely collected health data (omitting the latter data items from the survey reduced survey length and participant burden). This consent was requested of all women. The survey had an overall response rate of 46% (913 completed surveys returned, total sample 1989). Women receiving the pre-notification letter with the option of opting out of the survey were more likely to actively decline to participate than women receiving the letter without this option, although the overall numbers of women declining were small (27 versus 12). Letter type was not significantly associated with the return of a completed survey. Among women who completed the survey, 97% gave consent to link their survey data with existing health data. The two types of pre-notification letters used in our study did not influence the survey response rate. However, seeking consent for record linkage was highly acceptable to women who completed the survey, and represents an important strategy to add to the arsenal for designing and implementing effective surveys. In addition to aspects of survey design, future research should explore how to more effectively influence personal
Wang, Jing-Jing; Chen, Tzu-An; Baranowski, Tom; Lau, Patrick W C
This study aimed to evaluate the psychometric properties of four self-efficacy scales (i.e., self-efficacy for fruit (FSE), vegetable (VSE), and water (WSE) intakes, and physical activity (PASE)) and to investigate their differences in item functioning across sex, age, and body weight status groups using item response modeling (IRM) and differential item functioning (DIF). Four self-efficacy scales were administrated to 763 Hong Kong Chinese children (55.2% boys) aged 8-13 years. Classical test theory (CTT) was used to examine the reliability and factorial validity of scales. IRM was conducted and DIF analyses were performed to assess the characteristics of item parameter estimates on the basis of children's sex, age and body weight status. All self-efficacy scales demonstrated adequate to excellent internal consistency reliability (Cronbach's α: 0.79-0.91). One FSE misfit item and one PASE misfit item were detected. Small DIF were found for all the scale items across children's age groups. Items with medium to large DIF were detected in different sex and body weight status groups, which will require modification. A Wright map revealed that items covered the range of the distribution of participants' self-efficacy for each scale except VSE. Several self-efficacy scales' items functioned differently by children's sex and body weight status. Additional research is required to modify the four self-efficacy scales to minimize these moderating influences for application.
Jafari, Peyman; Sharafi, Zahra; Bagheri, Zahra; Shalileh, Sara
Measurement equivalence is a necessary assumption for meaningful comparison of pediatric quality of life rated by children and parents. In this study, differential item functioning (DIF) analysis is used to examine whether children and their parents respond consistently to the items in the KINDer Lebensqualitätsfragebogen (KINDL; in German, Children Quality of Life Questionnaire). Two DIF detection methods, graded response model (GRM) and ordinal logistic regression (OLR), were applied for comparability. The KINDL was completed by 1,086 school children and 1,061 of their parents. While the GRM revealed that 12 out of the 24 items were flagged with DIF, the OLR identified 14 out of the 24 items with DIF. Seven items with DIF and five items without DIF were common across the two methods, yielding a total agreement rate of 50 %. This study revealed that parent proxy-reports cannot be used as a substitute for a child's ratings in the KINDL.
Full Text Available Abstract Background As assessment has been shown to direct learning, it is critical that the examinations developed to test clinical competence in medical undergraduates are valid and reliable. The use of extended matching questions (EMQ has been advocated to overcome some of the criticisms of using multiple-choice questions to test factual and applied knowledge. Methods We analysed the results from the Extended Matching Questions Examination taken by 4th year undergraduate medical students in the academic year 2001 to 2002. Rasch analysis was used to examine whether the set of questions used in the examination mapped on to a unidimensional scale, the degree of difficulty of questions within and between the various medical and surgical specialties and the pattern of responses within individual questions to assess the impact of the distractor options. Results Analysis of a subset of items and of the full examination demonstrated internal construct validity and the absence of bias on the majority of questions. Three main patterns of response selection were identified. Conclusion Modern psychometric methods based upon the work of Rasch provide a useful approach to the calibration and analysis of EMQ undergraduate medical assessments. The approach allows for a formal test of the unidimensionality of the questions and thus the validity of the summed score. Given the metric calibration which follows fit to the model, it also allows for the establishment of items banks to facilitate continuity and equity in exam standards.
Co, Manuel C; Boden-Albala, Bernadette; Quarles, Leigh; Wilcox, Adam; Bakken, Suzanne
In designing informatics infrastructure to support comparative effectiveness research (CER), it is necessary to implement approaches for integrating heterogeneous data sources such as clinical data typically stored in clinical data warehouses and those that are normally stored in separate research databases. One strategy to support this integration is the use of a concept-oriented data dictionary with a set of semantic terminology models. The aim of this paper is to illustrate the use of the semantic structure of Clinical LOINC (Logical Observation Identifiers, Names, and Codes) in integrating community-based survey items into the Medical Entities Dictionary (MED) to support the integration of survey data with clinical data for CER studies.
Lo, Barbara Chuen Yee; Zhao, Yue; Kwok, Alice Wai Yee; Chan, Wai; Chan, Calais Kin Yuen
The present study applied item response theory to examine the psychometric properties of the Asian Adolescent Depression Scale and to construct a short form among 1,084 teenagers recruited from secondary schools in Hong Kong. Findings suggested that some items of the full form reflected higher levels of severity and were more discriminating than others, and the Asian Adolescent Depression Scale was useful in measuring a broad range of depressive severity in community youths. Differential item functioning emerged in several items where females reported higher depressive severity than males. In the short form construction, preliminary validation suggested that, relative to the 20-item full form, our derived short form offered significantly greater diagnostic performance and stronger discriminatory ability in differentiating depressed and nondepressed groups, and simultaneously maintained adequate measurement precision with a reduced response burden in assessing depression in the Asian adolescents. Cultural variance in depressive symptomatology and clinical implications are discussed.
Mumbardó-Adam, C; Guàrdia-Olmos, J; Giné, C; Raley, S K; Shogren, K A
A new measure of self-determination, the Self-Determination Inventory: Student Report (Spanish version), has recently been adapted and empirically validated in Spanish language. As it is the first instrument intended to measure self-determination in youth with and without disabilities, there is a need to further explore and strengthen its psychometric analysis based on item response patterns. Through item response theory approach, this study examined item observed distributions across the essential characteristics of self-determination. The results demonstrated satisfactory to excellent item functioning patterns across characteristics, particularly within agentic action domains. Increased variability across items was also found within action-control beliefs dimensions, specifically within the self-realisation subdomain. These findings further support the instrument's psychometric properties and outline future research directions. © 2017 MENCAP and International Association of the Scientific Study of Intellectual and Developmental Disabilities and John Wiley & Sons Ltd.
Wilmot, Michael P; Kostal, Jack W; Stillwell, David; Kosinski, Michal
For the past 40 years, the conventional univariate model of self-monitoring has reigned as the dominant interpretative paradigm in the literature. However, recent findings associated with an alternative bivariate model challenge the conventional paradigm. In this study, item response theory is used to develop measures of the bivariate model of acquisitive and protective self-monitoring using original Self-Monitoring Scale (SMS) items, and data from two large, nonstudent samples ( Ns = 13,563 and 709). Results indicate that the new acquisitive (six-item) and protective (seven-item) self-monitoring scales are reliable, unbiased in terms of gender and age, and demonstrate theoretically consistent relations to measures of personality traits and cognitive ability. Additionally, by virtue of using original SMS items, previously collected responses can be reanalyzed in accordance with the alternative bivariate model. Recommendations for the reanalysis of archival SMS data, as well as directions for future research, are provided.
Fox, Gerardus J.A.; Avetisyan, Marianna; van der Palen, Jacobus Adrianus Maria
Misleading response behavior is expected in medical settings where incriminating behavior is negatively related to the recovery from a disease. In the present study, lung patients feel social and professional pressure concerning smoking and experience questions about smoking behavior as sensitive
Wright, Daniel B.; Mathews, Sorcha A.; Skagerberg, Elin M.
When people discuss their memories, what one person says can influence what another personal reports. In 3 studies, participants were shown sets of stimuli and then given recognition memory tests to measure the effect of one person's response on another's. The 1st study (n=24) used word recognition with participant-confederate pairs and found that…
Jacobsen, Kathryn H; Aguirre, A Alonso; Bailey, Charles L; Baranova, Ancha V; Crooks, Andrew T; Croitoru, Arie; Delamater, Paul L; Gupta, Jhumka; Kehn-Hall, Kylene; Narayanan, Aarthi; Pierobon, Mariaelena; Rowan, Katherine E; Schwebach, J Reid; Seshaiyer, Padmanabhan; Sklarew, Dann M; Stefanidis, Anthony; Agouris, Peggy
As the Ebola outbreak in West Africa wanes, it is time for the international scientific community to reflect on how to improve the detection of and coordinated response to future epidemics. Our interdisciplinary team identified key lessons learned from the Ebola outbreak that can be clustered into three areas: environmental conditions related to early warning systems, host characteristics related to public health, and agent issues that can be addressed through the laboratory sciences. In particular, we need to increase zoonotic surveillance activities, implement more effective ecological health interventions, expand prediction modeling, support medical and public health systems in order to improve local and international responses to epidemics, improve risk communication, better understand the role of social media in outbreak awareness and response, produce better diagnostic tools, create better therapeutic medications, and design better vaccines. This list highlights research priorities and policy actions the global community can take now to be better prepared for future emerging infectious disease outbreaks that threaten global public health and security.
Cooper, Alannah Louise; Brown, Janie
Low response rates to surveys have been a long-standing issue in research. This includes research involving nurses and midwives. To gain representative samples, appropriate measures to maximise response rates need to be used. To explore ways to maximise response rates from nurses and midwives, using a hospital-wide survey as an example. All nurses and midwives at the study hospital were invited to participate in a survey. To encourage participation and elicit an adequate response rate, several strategies were used. A total of 1,000 surveys were distributed and 319 (32%) were returned. All the required age groups, levels of experience and types of nursing registration were represented in the responses and data saturation was achieved. It is important to pay attention to obtaining a representative sample. Further investigation of response rates to surveys by nurses and midwives is warranted. Strategies to maximise response rates from a target population should be used when conducting surveys. ©2017 RCN Publishing Company Ltd. All rights reserved. Not to be copied, transmitted or recorded in any way, in whole or part, without prior permission of the publishers.
Shimazu, Akihito; Schaufeli, Wilmar B; Miyanaka, Daisuke; Iwata, Noboru
With the globalization of occupational health psychology, more and more researchers are interested in applying employee well-being like work engagement (i.e., a positive, fulfilling, work-related state of mind that is characterized by vigor, dedication, and absorption) to diverse populations. Accurate measurement contributes to our further understanding and to the generalizability of the concept of work engagement across different cultures. The present study investigated the measurement accuracy of the Japanese and the original Dutch versions of the Utrecht Work Engagement Scale (9-item version, UWES-9) and the comparability of this scale between both countries. Item Response Theory (IRT) was applied to the data from Japan (N = 2,339) and the Netherlands (N = 13,406). Reliability of the scale was evaluated at various levels of the latent trait (i.e., work engagement) based the test information function (TIF) and the standard error of measurement (SEM). The Japanese version had difficulty in differentiating respondents with extremely low work engagement, whereas the original Dutch version had difficulty in differentiating respondents with high work engagement. The measurement accuracy of both versions was not similar. Suppression of positive affect among Japanese people and self-enhancement (the general sensitivity to positive self-relevant information) among Dutch people may have caused decreased measurement accuracy. Hence, we should be cautious when interpreting low engagement scores among Japanese as well as high engagement scores among western employees.
Full Text Available Abstract With the globalization of occupational health psychology, more and more researchers are interested in applying employee well-being like work engagement (i.e., a positive, fulfilling, work-related state of mind that is characterized by vigor, dedication, and absorption to diverse populations. Accurate measurement contributes to our further understanding and to the generalizability of the concept of work engagement across different cultures. The present study investigated the measurement accuracy of the Japanese and the original Dutch versions of the Utrecht Work Engagement Scale (9-item version, UWES-9 and the comparability of this scale between both countries. Item Response Theory (IRT was applied to the data from Japan (N = 2,339 and the Netherlands (N = 13,406. Reliability of the scale was evaluated at various levels of the latent trait (i.e., work engagement based the test information function (TIF and the standard error of measurement (SEM. The Japanese version had difficulty in differentiating respondents with extremely low work engagement, whereas the original Dutch version had difficulty in differentiating respondents with high work engagement. The measurement accuracy of both versions was not similar. Suppression of positive affect among Japanese people and self-enhancement (the general sensitivity to positive self-relevant information among Dutch people may have caused decreased measurement accuracy. Hence, we should be cautious when interpreting low engagement scores among Japanese as well as high engagement scores among western employees.
Sunderland, Matthew; Batterham, Philip; Calear, Alison; Carragher, Natacha; Baillie, Andrew; Slade, Tim
There is no standardized approach to the measurement of social anxiety. Researchers and clinicians are faced with numerous self-report scales with varying strengths, weaknesses, and psychometric properties. The lack of standardization makes it difficult to compare scores across populations that utilise different scales. Item response theory offers one solution to this problem via equating different scales using an anchor scale to set a standardized metric. This study is the first to equate several scales for social anxiety disorder. Data from two samples (n=3,175 and n=1,052), recruited from the Australian community using online advertisements, were utilised to equate a network of 11 self-report social anxiety scales via a fixed parameter item calibration method. Comparisons between actual and equated scores for most of the scales indicted a high level of agreement with mean differences <0.10 (equivalent to a mean difference of less than one point on the standardized metric). This study demonstrates that scores from multiple scales that measure social anxiety can be converted to a common scale. Re-scoring observed scores to a common scale provides opportunities to combine research from multiple studies and ultimately better assess social anxiety in treatment and research settings. Copyright © 2018. Published by Elsevier Inc.
Dawson, Deborah A; Saha, Tulshi D; Grant, Bridget F
The relative severity of the 11 DSM-IV alcohol use disorder (AUD) criteria are represented by their severity threshold scores, an item response theory (IRT) model parameter inversely proportional to their prevalence. These scores can be used to create a continuous severity measure comprising the total number of criteria endorsed, each weighted by its relative severity. This paper assesses the validity of the severity ranking of the 11 criteria and the overall severity score with respect to known AUD correlates, including alcohol consumption, psychological functioning, family history, antisociality, and early initiation of drinking, in a representative population sample of U.S. past-year drinkers (n=26,946). The unadjusted mean values for all validating measures increased steadily with the severity threshold score, except that legal problems, the criterion with the highest score, was associated with lower values than expected. After adjusting for the total number of criteria endorsed, this direct relationship was no longer evident. The overall severity score was no more highly correlated with the validating measures than a simple count of criteria endorsed, nor did the two measures yield different risk curves. This reflects both within-criterion variation in severity and the fact that the number of criteria endorsed and their severity are so highly correlated that severity is essentially redundant. Attempts to formulate a scalar measure of AUD will do as well by relying on simple counts of criteria or symptom items as by using scales weighted by IRT measures of severity. Published by Elsevier Ireland Ltd.
Lorber, Michael F; Xu, Shu; Slep, Amy M Smith; Bulling, Lisanne; O'Leary, Susan G
The psychometrics of the Parenting Scale's Overreactivity and Laxness subscales were evaluated using item response theory (IRT) techniques. The IRT analyses were based on 2 community samples of cohabiting parents of 3- to 8-year-old children, combined to yield a total sample size of 852 families. The results supported the utility of the Overreactivity and Laxness subscales, particularly in discriminating among parents in the mid to upper reaches of each construct. The original versions of the Overreactivity and Laxness subscales were more reliable than alternative, shorter versions identified in replicated factor analyses from previously published research and in IRT analyses in the present research. Moreover, in several cases, the original versions of these subscales, in comparison with the shortened versions, exhibited greater 6-month stabilities and correlations with child externalizing behavior and couple relationship satisfaction. Reliability was greater for the Laxness than for the Overreactivity subscale. Item performance on each subscale was highly variable. Together, the present findings are generally supportive of the psychometrics of the Parenting Scale, particularly for clinical research and practice. They also suggest areas for further development.
Lambert, Amber D.; Miller, Angie L.
With the growing reliance on tablets and smartphones for internet access, understanding the effects of completion device on online survey responses becomes increasing important. This study uses data from the Strategic National Arts Alumni Project, a multi-institution online alumni survey designed to obtain knowledge of arts education, to explore…
Full Text Available This study addressed reverse directional item and the number of response categories problems in Likert-type scales. The Fear of Negative Evaluation Scale (FNES and the Oxford Happiness Questionnaire (OHQ were used as data collection tools. The data of the study were analyzed according to the Rasch model. The analysis found that the observed and expected test characteristic curves were largely overlapped, each of the three rating scales worked effectively, and the differences between response categories could be distinguished successfully by the participants in straightforward directional items. On the other hand, it was determined that there were significant differences between the observed and expected test characteristic curves in reverse directional items. It was also found that no matter which one of these three, five and seven-point rating scales was used, the participants could not distinguish the response categories of the reverse directional items on the FNES and the OHQ. Afterwards, the reverse directional items were removed from the data file, and the analysis was repeated. The analysis results revealed that item discrimination, reliability coefficients for person facet, separation ratios and Chi square values calculated for the facets of person and items were higher in five-pointed rating compared to three and seven pointed rating.
Greenlaw, Corey; Brown-Welty, Sharon
Web-based surveys have become more prevalent in areas such as evaluation, research, and marketing research to name a few. The proliferation of these online surveys raises the question, how do response rates compare with traditional surveys and at what cost? This research explored response rates and costs for Web-based surveys, paper surveys, and…
MacCann, Robert G.; Stanley, Gordon
An item banking method that does not use Item Response Theory (IRT) is described. This method provides a comparable grading system across schools that would be suitable for low-stakes testing. It uses the Angoff standard-setting method to obtain item ratings that are stored with each item. An example of such a grading system is given, showing how…
Bacci, Elizabeth D; Staniewska, Dorota; Coyne, Karin S; Boyer, Stacey; White, Leigh Ann; Zach, Neta; Cedarbaum, Jesse M
Our objective was to examine dimensionality and item-level performance of the Amyotrophic Lateral Sclerosis Functional Rating Scale-Revised (ALSFRS-R) across time using classical and modern test theory approaches. Confirmatory factor analysis (CFA) and Item Response Theory (IRT) analyses were conducted using data from patients with amyotrophic lateral sclerosis (ALS) Pooled Resources Open-Access ALS Clinical Trials (PRO-ACT) database with complete ALSFRS-R data (n = 888) at three time-points (Time 0, Time 1 (6-months), Time 2 (1-year)). Results demonstrated that in this population of 888 patients, mean age was 54.6 years, 64.4% were male, and 93.7% were Caucasian. The CFA supported a 4* individual-domain structure (bulbar, gross motor, fine motor, and respiratory domains). IRT analysis within each domain revealed misfitting items and overlapping item response category thresholds at all time-points, particularly in the gross motor and respiratory domain items. Results indicate that many of the items of the ALSFRS-R may sub-optimally distinguish among varying levels of disability assessed by each domain, particularly in patients with less severe disability. Measure performance improved across time as patient disability severity increased. In conclusion, modifications to select ALSFRS-R items may improve the instrument's specificity to disability level and sensitivity to treatment effects.
Torres van Grinsven, Vanessa; Bolko, Irena; Bavdaz, Mojca
Increasing reluctance of businesses to participate in surveys often leads to declining or low response rates, poor data quality and burden complaints, and suggests that a driving force, that is, the motivation for participation and accurate and timely response, is insufficient or lacking.
most examinees. Therefore it appears psychometrically ac - ceptable for the CAT -ASVAB project to proceed without item recalibration based on...MEMORANDUM DETERMINING THE SENSITIVITY OF CAT -ASVAB SCORES TO CHANGES IN ITEM RESPONSE CURVES WITH THE MEDIUM OF ADMINISTRATION D. R. Divgi...Subj: Center for Naval Analyses Research Memorandum 86-189 End: (1) CNA Research Memorandum 86-189, "Determining the Sensitivity of CAT -ASVAB
Zhong, Qiuyue; Gelaye, Bizu; Fann, Jesse R; Sanchez, Sixto E; Williams, Michelle A
We sought to evaluate the validity of the Spanish language version of the patient health questionnaire-9 (PHQ-9) depression scale in a large sample of pregnant Peruvian women using Rasch item response theory (IRT) approaches. We further sought to examine the appropriateness of the response formats, reliability and potential differential item functioning (DIF) by maternal age, educational attainment and employment status. This cross-sectional study was conducted among 1520 pregnant women in Lima, Peru. A structured interview was used to collect information on demographic characteristics and PHQ-9 items. Data from the PHQ-9 were fitted to the Rasch IRT model and tested for appropriate category ordering, the assumptions of unidimensionality and local independence, item fit, reliability and presence of DIF. The Spanish language version of PHQ-9 demonstrated unidimensionality, local independence, and acceptable fit for the Rasch IRT model. However, we detected disordered response categories for the original four response categories. After collapsing "more than half the days" and "nearly every day", the response categories ordered properly and the PHQ-9 fit the Rasch IRT model. The PHQ-9 had moderate internal consistency (person separation index, PSI=0.72). Additionally, the items of PHQ-9 were free of DIF with regard to age, educational attainment, and employment status. The Spanish language version of the PHQ-9 was shown to have item properties of an effective screening instrument. Collapsing rating scale categories and reconstructing three-point Likert scale for all items improved the fit of the instrument. Future studies are warranted to establish new cutoff scores and criterion validity of the three-point Likert scale response options for the Spanish language version of the PHQ-9. Copyright © 2014 Elsevier B.V. All rights reserved.
Chen Changmao; Liu Jinhua; Xie Jianlun; Su Jingling
The energy response and directionality of neutron survey meter type MK7 and 2202D are determined. The reactor thermal column beam, reactor filtered beams (6 eV, 24.4 keV and 144 keV), 226 Ra-Be, 241 Am-Be, 252 Cf and its moderated sources are used for the measurement. The results shows: the survey meters are influenced obviously by the direction; the response of middle-energy region is large, the energy response of 2202D is better than MK7
Ishimoto, Michi; Davenport, Glen; Wittmann, Michael C.
Student views of force and motion reflect the personal experiences and physics education of the student. With a different language, culture, and educational system, we expect that Japanese students' views on force and motion might be different from those of American students. The Force and Motion Conceptual Evaluation (FMCE) is an instrument used to probe student views on force and motion. It was designed using research on American students, and, as such, the items might function differently for Japanese students. Preliminary results from a translated version indicated that Japanese students had similar misconceptions as those of American students. In this study, we used item response curves (IRCs) to make more detailed item-by-item comparisons. IRCs show the functioning of individual items across all levels of performance by plotting the proportion of each response as a function of the total score. Most of the IRCs showed very similar patterns on both correct and incorrect responses; however, a few of the plots indicate differences between the populations. The similar patterns indicate that students tend to interact with FMCE items similarly, despite differences in culture, language, and education. We speculate about the possible causes for the differences in some of the IRCs. This report is intended to show how IRCs can be used as a part of the validation process when making comparisons across languages and nationalities. Differences in IRCs can help to pinpoint artifacts of translation, contextual effects because of differences in culture, and perhaps intrinsic differences in student understanding of Newtonian motion.
Full Text Available Student views of force and motion reflect the personal experiences and physics education of the student. With a different language, culture, and educational system, we expect that Japanese students’ views on force and motion might be different from those of American students. The Force and Motion Conceptual Evaluation (FMCE is an instrument used to probe student views on force and motion. It was designed using research on American students, and, as such, the items might function differently for Japanese students. Preliminary results from a translated version indicated that Japanese students had similar misconceptions as those of American students. In this study, we used item response curves (IRCs to make more detailed item-by-item comparisons. IRCs show the functioning of individual items across all levels of performance by plotting the proportion of each response as a function of the total score. Most of the IRCs showed very similar patterns on both correct and incorrect responses; however, a few of the plots indicate differences between the populations. The similar patterns indicate that students tend to interact with FMCE items similarly, despite differences in culture, language, and education. We speculate about the possible causes for the differences in some of the IRCs. This report is intended to show how IRCs can be used as a part of the validation process when making comparisons across languages and nationalities. Differences in IRCs can help to pinpoint artifacts of translation, contextual effects because of differences in culture, and perhaps intrinsic differences in student understanding of Newtonian motion.
Molenaar, Dylan; Dolan, Conor V.; de Boeck, Paul
The Graded Response Model (GRM; Samejima, "Estimation of ability using a response pattern of graded scores," Psychometric Monograph No. 17, Richmond, VA: The Psychometric Society, 1969) can be derived by assuming a linear regression of a continuous variable, Z, on the trait, [theta], to underlie the ordinal item scores (Takane & de Leeuw in…
Deding, Mette; Fridberg, Torben; Jakobsen, Vibeke
The purpose of this paper is to study how various characteristics of respondents and interviewers affect non-response among immigrants. We use a survey conducted among immigrants in Denmark and ethnic Danes. First, we analyse the determinants of overall non-response. Second, we analyse how...... the determinants of contact and of response given contact differ. We find that characteristics of the respondents are important for the response rate – especially they are important for the probability of getting in contact with the respondent. The lower probability of response among immigrants compared to ethnic...
MATERIEL SUPPLY CHAIN: LEVERAGING NEW METRICS TO IDENTIFY AND CLASSIFY ITEMS OF CONCERN by Andrew R. Haley June 2016 Thesis Advisor: Robert...IDENTIFY AND CLASSIFY ITEMS OF CONCERN 5. FUNDING NUMBERS 6. AUTHOR(S) Andrew R. Haley 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Naval...Supply Systems Command (NAVSUP), logistics, inventory, consumable, NSNs at Risk, Bad Actors, Bad Actors with Trend, items of concern , customer time
Torres van Grinsven Vanessa
Full Text Available Increasing reluctance of businesses to participate in surveys often leads to declining or low response rates, poor data quality and burden complaints, and suggests that a driving force, that is, the motivation for participation and accurate and timely response, is insufficient or lacking. Inspiration for ways to remedy this situation has already been sought in the psychological theory of self-determination; previous research has favored enhancement of intrinsic motivation compared to extrinsic motivation. Traditionally however, enhancing extrinsic motivation has been pervasive in business surveys. We therefore review this theory in the context of business surveys using empirical data from the Netherlands and Slovenia, and suggest that extrinsic motivation calls for at least as much attention as intrinsic motivation, that other sources of motivation may be relevant besides those stemming from the three fundamental psychological needs (competence, autonomy and relatedness, and that other approaches may have the potential to better explain some aspects of motivation in business surveys (e.g., implicit motives. We conclude with suggestions that survey organizations can consider when attempting to improve business survey response behavior.
Egberink, Iris J L; Meijer, Rob R
The authors investigated the psychometric properties of the subscales of the Self-Perception Profile for Children with item response theory (IRT) models using a sample of 611 children. Results from a nonparametric Mokken analysis and a parametric IRT approach for boys (n = 268) and girls (n = 343) were compared. The authors found that most scales formed weak scales and that measurement precision was relatively low and only present for latent trait values indicating low self-perception. The subscales Physical Appearance and Global Self-Worth formed one strong scale. Children seem to interpret Global Self-Worth items as if they measure Physical Appearance. Furthermore, the authors found that strong Mokken scales (such as Global Self-Worth) consisted mostly of items that repeat the same item content. They conclude that researchers should be very careful in interpreting the total scores on the different Self-Perception Profile for Children scales. Finally, implications for further research are discussed.
Crede, Erin; Borrego, Maura
As part of a sequential exploratory mixed methods study, 9 months of ethnographically guided observations and interviews were used to develop a survey examining graduate engineering student retention. Findings from the ethnographic fieldwork yielded several themes, including international diversity, research group organization and climate,…
Light gm Slas 14 14 Lis Smm pmlew aumk Simm Milk glass 1 Modern brown glass I Olive glass _ Tin cm key 11 Shotg cartridge 1 Slate 1 Mortar 1 1 Piece of...Parish, Louisiana. Anthropological Report No. 1. Archaeological Survey and Antiquities Commission, Department of Culture, Recreation and Tourism , Baton
Sibley, Chris G; Houkamau, Carla A
We argue that there is a need for culture-specific measures of identity that delineate the factors that most make sense for specific cultural groups. One such measure, recently developed specifically for Māori peoples, is the Multi-Dimensional Model of Māori Identity and Cultural Engagement (MMM-ICE). Māori are the indigenous peoples of New Zealand. The MMM-ICE is a 6-factor measure that assesses the following aspects of identity and cultural engagement as Māori: (a) group membership evaluation, (b) socio-political consciousness, (c) cultural efficacy and active identity engagement, (d) spirituality, (e) interdependent self-concept, and (f) authenticity beliefs. This article examines the scale properties of the MMM-ICE using item response theory (IRT) analysis in a sample of 492 Māori. The MMM-ICE subscales showed reasonably even levels of measurement precision across the latent trait range. Analysis of age (cohort) effects further indicated that most aspects of Māori identification tended to be higher among older Māori, and these cohort effects were similar for both men and women. This study provides novel support for the reliability and measurement precision of the MMM-ICE. The study also provides a first step in exploring change and stability in Māori identity across the life span. A copy of the scale, along with recommendations for scale scoring, is included.
Full Text Available In spite of the growing interest in the methods of evaluating the classification consistency (CC indices, only few researches are available in the field of applying these methods in the practice of large-scale educational assessment. In addition, only few studies considered the influence of practical factors, for example, the examinee ability distribution, the cut score location and the score scale, on the performance of CC indices. Using the newly developed Lee's procedure based on the item response theory (IRT, the main purpose of this study is to investigate the performance of CC indices when practical factors are taken into consideration. A simulation study and an empirical study were conducted under comprehensive conditions. Results suggested that with negatively skewed distribution, the CC indices were larger than with other distributions. Interactions occurred among ability distribution, cut score location, and score scale. Consequently, Lee's IRT procedure is reliable to be used in the field of large-scale educational assessment, and when reporting the indices, it should be treated with caution as testing conditions may vary a lot.
Laplante-Lévesque, Ariane; Hickson, Louise; Worrall, Linda
This study investigated how clients quantify use of hearing rehabilitation. Comparisons focused on the daily-use item of the International Outcome Inventory for Hearing Aids (IOI-HA), and for Alternative Interventions (IOI-AI). Adults with hearing impairment completed the original versions of the IOI-HA and the IOI-AI daily-use item which has five numerical response options (e.g. 1-4 hours/day) and a modified version with five word response options (e.g. 'Sometimes'). Respondents completed both IOI versions immediately after intervention completion and three months later. In total, 64 people who had obtained hearing aids completed both IOI-HA versions and 27 people who had participated in communication programs completed both IOI-AI versions. Participants reported higher scores on the modified (word) daily-use item than on the original (number) daily-use item. Participants who completed the IOI-AI did so significantly more than participants who completed the IOI-HA. This was true both after intervention completion and three months later. This study showed that comparisons between IOI-HA and IOI-AI daily-use item scores should be made with caution. Word daily-use response options are recommended for the IOI-AI (i.e. Never; Rarely; Sometimes; Often; and Almost always).
Petrillo, Jennifer; Cano, Stefan J; McLeod, Lori D; Coon, Cheryl D
To provide comparisons and a worked example of item- and scale-level evaluations based on three psychometric methods used in patient-reported outcome development-classical test theory (CTT), item response theory (IRT), and Rasch measurement theory (RMT)-in an analysis of the National Eye Institute Visual Functioning Questionnaire (VFQ-25). Baseline VFQ-25 data from 240 participants with diabetic macular edema from a randomized, double-masked, multicenter clinical trial were used to evaluate the VFQ at the total score level. CTT, RMT, and IRT evaluations were conducted, and results were assessed in a head-to-head comparison. Results were similar across the three methods, with IRT and RMT providing more detailed diagnostic information on how to improve the scale. CTT led to the identification of two problematic items that threaten the validity of the overall scale score, sets of redundant items, and skewed response categories. IRT and RMT additionally identified poor fit for one item, many locally dependent items, poor targeting, and disordering of over half the response categories. Selection of a psychometric approach depends on many factors. Researchers should justify their evaluation method and consider the intended audience. If the instrument is being developed for descriptive purposes and on a restricted budget, a cursory examination of the CTT-based psychometric properties may be all that is possible. In a high-stakes situation, such as the development of a patient-reported outcome instrument for consideration in pharmaceutical labeling, however, a thorough psychometric evaluation including IRT or RMT should be considered, with final item-level decisions made on the basis of both quantitative and qualitative results. Copyright © 2015. Published by Elsevier Inc.
Kang, Yu-kun; Guo, Wan-jun; Xu, Hao; Chen, Yue-hui; Li, Xiao-jing; Tan, Zheng-ping; Li, Na; Gesang, Ze-Ren; Wang, Ying-mei; Liu, Chang-bo; Luo, Ying; Feng, Jia; Xu, Qiu-jie; Lee, Sing; Li, Tao
To evaluate the psychometric properties of the 6-item Kessler psychological distress scale (K6) in screening for serious mental illness (SMI) among undergraduates in a major comprehensive university in China. The K6 was self-completed by 8289 randomly sampled participants. A group of them (n=222) were re-assessed using K6 and interviewed using the Chinese version of Composite International Diagnostic Interview 3.1 (CIDI-3.1). The test-retest reliability of the K6 scale was 0.79, the Cronbach's alpha was 0.84, and its area under the receiver operating curve (AUC) for diagnosing CIDI-3.1 SMI was 0.85 (95% CI=0.80-0.90). For the optimal cut-off of K6 (12/13), the sensitivity (SEN), specificity (SPE), positive predictive value (PPV), negative predictive value (NPV), and classification accuracy (AC) were 0.83, 0.79, 0.60, 0.93, and 0.80, respectively. The 12-month prevalence of SMI was estimated as 3.97% using this optimal cut-off. Binary logistic regression analysis (including gender, ethnicity, grade, number of siblings and family residency location) showed that only family residency location in rural areas compared to urban areas was significantly associated with more SMI. This study documented the value of using the K6 for detecting SMI in Chinese undergraduate populations and supported its cross-cultural reliability and validity. Copyright © 2015 Elsevier Inc. All rights reserved.
Podsakoff, Nathan P; Whiting, Steven W; Welsh, David T; Mai, Ke Michael
Despite the increased attention paid to biases attributable to common method variance (CMV) over the past 50 years, researchers have only recently begun to systematically examine the effect of specific sources of CMV in previously published empirical studies. Our study contributes to this research by examining the extent to which common rater, item, and measurement context characteristics bias the relationships between organizational citizenship behaviors and performance evaluations using a mixed-effects analytic technique. Results from 173 correlations reported in 81 empirical studies (N = 31,146) indicate that even after controlling for study-level factors, common rater and anchor point number similarity substantially biased the focal correlations. Indeed, these sources of CMV (a) led to estimates that were between 60% and 96% larger when comparing measures obtained from a common rater, versus different raters; (b) led to 39% larger estimates when a common source rated the scales using the same number, versus a different number, of anchor points; and (c) when taken together with other study-level predictors, accounted for over half of the between-study variance in the focal correlations. We discuss the implications for researchers and practitioners and provide recommendations for future research. PsycINFO Database Record (c) 2013 APA, all rights reserved
Fraser, Raphael André; Lipsitz, Stuart R; Sinha, Debajyoti; Fitzmaurice, Garrett M; Pan, Yi
The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling, and weighting. In this article, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS)'based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey. © 2016, The International Biometric Society.
Torres van Grinsven Vanessa; Bolko Irena; Bavdaž Mojca
Increasing reluctance of businesses to participate in surveys often leads to declining or low response rates, poor data quality and burden complaints, and suggests that a driving force, that is, the motivation for participation and accurate and timely response, is insufficient or lacking. Inspiration for ways to remedy this situation has already been sought in the psychological theory of self-determination; previous research has favored enhancement of intrinsic motivation compared to extrinsi...
Kristi L Santi
Full Text Available Visual processing has been widely studied in regard to its impact on a students’ ability to read. A less researched area is the role of reading in the development of visual processing skills. A cohort-sequential, accelerated-longitudinal design was utilized with 932 kindergarten, first, and second grade students to examine the impact of reading acquisition on the processing of various types of visual discrimination and visual motor test items. Students were assessed four times per year on a variety of reading measures and reading precursors and two popular measures of visual processing over a three-year period. Explanatory item response models were used to examine the roles of person and item characteristics on changes in visual processing abilities and changes in item difficulties over time. Results showed different developmental patterns for five types of visual processing test items, but most importantly failed to show consistent effects of learning to read on changes in item difficulty. Thus, the present study failed to find support for the hypothesis that learning to read alters performance on measures of visual processing. Rather, visual processing and reading ability improved together over time with no evidence to suggest cross-domain influences from reading to visual processing. Results are discussed in the context of developmental theories of visual processing and brain-based research on the role of visual skills in learning to read.
Didier, C.; Huet, R.
In this paper, we present and discuss the results of a survey of how corporate social responsibility (CSR) is being discussed and taught in engineering education in France. We shall first describe how those questions have been recently tackled in various programmes of higher education in France. We shall also analyse what faculty members have to…
A questionnaire survey of senior house officers/registrars response to their training at University College Hospital, Ibadan. ... A regular conduct of auditing of training programmes is recommended. Keywords: Questionnaire ... spécialistes. Nous proposons l'organisation régulière de la verification de programme de formation.
Cohen, Alexander H.; Krueger, James S.
Recent social scientific research has examined connections between public opinion and weather conditions. This article contributes to this literature by analyzing the relationship between high temperature and survey response. Because hot temperatures are associated with aggression, irritation, and negativity, such conditions should lead to the…
Proposta de um instrumento de medida para avaliar a satisfação de clientes de bancos utilizando a Teoria da Resposta ao Item Proposal of tool to assess the satisfaction of bank customers using the Item Response Theory
Alceu Balbim Junior
Full Text Available Este artigo apresenta um instrumento de medida para avaliação da satisfação de clientes de bancos utilizando a Teoria da Resposta ao Item (TRI. Satisfazer os clientes tem sido uma busca constante das organizações que procuram manterem-se competitivas no mercado. Estudos constatam a relação entre a qualidade percebida pelos clientes, a satisfação e fidelidade. A avaliação da satisfação pode ser realizada por meio da qualidade percebida pelos clientes e a construção de ferramentas de avaliação deve contemplar características específicas da atividade em questão. Embasando-se em artigos que avaliam a satisfação de clientes de bancos, propõe-se um instrumento formado por 29 itens. Os itens foram aplicados a 240 clientes a fim de avaliar a satisfação com o banco de maior relacionamento. Utilizando a Teoria da Resposta ao Item, foram identificados os parâmetros dos itens e a curva de informação. A análise do grau de discriminação dos itens indicou que todos são apropriados. A curva de informação obtida evidenciou o intervalo no qual o instrumento apresenta melhores estimativas para níveis de satisfação. O trabalho apresentou o nível médio de satisfação da amostra e a concentração de clientes nos diferentes níveis de satisfação da escala.This paper presents a model for assessing the satisfaction of bank customers using the Item Response Theory (IRT. Organizations are constantly making effort to satisfy customers seeking to remain competitive. Several studies have reported on the relationship between perceived quality, satisfaction, and loyalty. The assessment of satisfaction can be accomplished through the perceived quality, and the development of assessment tools should address specific features of the activity in question. Based on articles that assess the satisfaction of bank customers, this study proposes an assessment tool consisting of 29 items. The items were applied to 240 clients to assess their
Desenvolvimento de uma escala para medir o potencial empreendedor utilizando a Teoria da Resposta ao Item (TRI Development of a scale to measure the entrepreneurial potential using the Item Response Theory (IRT
Luciano Ricardo Rath Alves
Full Text Available Diversas variáveis estão relacionadas ao desenvolvimento da atividade empreendedora, verifica-se, entre elas, a importância do agente empreendedor. Dos estudos que contribuem para o seu entendimento, este segue a linha que defende que o empreendedor tem características e traços de personalidade singulares em relação à população, os quais são propícios ao sucesso do empreendedorismo. O objetivo deste trabalho é desenvolver uma escala para medir o potencial empreendedor utilizando a Teoria da Resposta ao Item. Foi utilizado o modelo logístico de dois parâmetros da TRI. As estimativas dos parâmetros foram obtidas a partir da amostra com 764 pessoas que responderam a um instrumento composto por 103 itens. A curva de informação e do erro padrão do teste e a interpretação qualitativa de níveis da escala permitiram determinar o intervalo mais apropriado para utilização do instrumento. Os resultados mostraram que a escala é mais adequada para avaliar indivíduos com baixo até moderadamente alto potencial empreendedor. Por isso, sugere-se que novos itens sejam incorporados ao instrumento para mensurar e interpretar níveis ainda mais elevados. A Teoria da Resposta ao Item permite que novos itens sejam calibrados a fim de mensurar os empreendedores com alto potencial empreendedor, aproveitando os dados já obtidos.Several variables are related to the development of entrepreneurial activities. An important one among them is the entrepreneurial agent. This study is one of many that contribute to the understanding of the entrepreneurial agent. In its line of thought, it upholds the idea that the entrepreneur has characteristics and personality traits that stand out from the general population and that are favorable to the success of the entrepreneurship. This study aims at developing a measurement scale for entrepreneurial potential using the Item Response Theory. The items were generated by Santos (2008 based on a theoretical model
De Champlain, Andre F; Boulais, Andre-Philippe; Dallas, Andrew
The aim of this research was to compare different methods of calibrating multiple choice question (MCQ) and clinical decision making (CDM) components for the Medical Council of Canada's Qualifying Examination Part I (MCCQEI) based on item response theory. Our data consisted of test results from 8,213 first time applicants to MCCQEI in spring and fall 2010 and 2011 test administrations. The data set contained several thousand multiple choice items and several hundred CDM cases. Four dichotomous calibrations were run using BILOG-MG 3.0. All 3 mixed item format (dichotomous MCQ responses and polytomous CDM case scores) calibrations were conducted using PARSCALE 4. The 2-PL model had identical numbers of items with chi-square values at or below a Type I error rate of 0.01 (83/3,499 or 0.02). In all 3 polytomous models, whether the MCQs were either anchored or concurrently run with the CDM cases, results suggest very poor fit. All IRT abilities estimated from dichotomous calibration designs correlated very highly with each other. IRT-based pass-fail rates were extremely similar, not only across calibration designs and methods, but also with regard to the actual reported decision to candidates. The largest difference noted in pass rates was 4.78%, which occurred between the mixed format concurrent 2-PL graded response model (pass rate= 80.43%) and the dichotomous anchored 1-PL calibrations (pass rate= 85.21%). Simpler calibration designs with dichotomized items should be implemented. The dichotomous calibrations provided better fit of the item response matrix than more complex, polytomous calibrations.
Andre F. De Champlain
Full Text Available Purpose: The aim of this research was to compare different methods of calibrating multiple choice question (MCQ and clinical decision making (CDM components for the Medical Council of Canada’s Qualifying Examination Part I (MCCQEI based on item response theory. Methods: Our data consisted of test results from 8,213 first time applicants to MCCQEI in spring and fall 2010 and 2011 test administrations. The data set contained several thousand multiple choice items and several hundred CDM cases. Four dichotomous calibrations were run using BILOG-MG 3.0. All 3 mixed item format (dichotomous MCQ responses and polytomous CDM case scores calibrations were conducted using PARSCALE 4. Results: The 2-PL model had identical numbers of items with chi-square values at or below a Type I error rate of 0.01 (83/3,499 or 0.02. In all 3 polytomous models, whether the MCQs were either anchored or concurrently run with the CDM cases, results suggest very poor fit. All IRT abilities estimated from dichotomous calibration designs correlated very highly with each other. IRT-based pass-fail rates were extremely similar, not only across calibration designs and methods, but also with regard to the actual reported decision to candidates. The largest difference noted in pass rates was 4.78%, which occurred between the mixed format concurrent 2-PL graded response model (pass rate= 80.43% and the dichotomous anchored 1-PL calibrations (pass rate= 85.21%. Conclusion: Simpler calibration designs with dichotomized items should be implemented. The dichotomous calibrations provided better fit of the item response matrix than more complex, polytomous calibrations.
Panouillères, M; Anota, A; Nguyen, T V; Brédart, A; Bosset, J F; Monnier, A; Mercier, M; Hardouin, J B
The present study investigates the properties of the French version of the OUT-PATSAT35 questionnaire, which evaluates the outpatients' satisfaction with care in oncology using classical analysis (CTT) and item response theory (IRT). This cross-sectional multicenter study includes 692 patients who completed the questionnaire at the end of their ambulatory treatment. CTT analyses tested the main psychometric properties (convergent and divergent validity, and internal consistency). IRT analyses were conducted separately for each OUT-PATSAT35 domain (the doctors, the nurses or the radiation therapists and the services/organization) by models from the Rasch family. We examined the fit of the data to the model expectations and tested whether the model assumptions of unidimensionality, monotonicity and local independence were respected. A total of 605 (87.4%) respondents were analyzed with a mean age of 64 years (range 29-88). Internal consistency for all scales separately and for the three main domains was good (Cronbach's α 0.74-0.98). IRT analyses were performed with the partial credit model. No disordered thresholds of polytomous items were found. Each domain showed high reliability but fitted poorly to the Rasch models. Three items in particular, the item about "promptness" in the doctors' domain and the items about "accessibility" and "environment" in the services/organization domain, presented the highest default of fit. A correct fit of the Rasch model can be obtained by dropping these items. Most of the local dependence concerned items about "information provided" in each domain. A major deviation of unidimensionality was found in the nurses' domain. CTT showed good psychometric properties of the OUT-PATSAT35. However, the Rasch analysis revealed some misfitting and redundant items. Taking the above problems into consideration, it could be interesting to refine the questionnaire in a future study.
Jang, Yuri; Kwag, Kyung Hwa; Chiriboga, David A
Given the emphasis on modesty and self-effacement in Asian societies, the present study explored differential item responses for 2 positive affect items (5 = Hopeful and 8 = Happy) on a short form of the Center for Epidemiologic Studies-Depression scale. The samples consisted of elderly non-Hispanic Whites (n = 450), Korean Americans (n = 519), and Koreans (n = 2,030). Multiple Indicator Multiple Cause models were estimated to identify the impact of group membership on responses to the positive affect items while controlling for the latent trait of depressive symptoms. The data revealed that Koreans and Korean Americans were less likely than non-Hispanic Whites to endorse the positive affect items. Compared with Korean Americans who were more acculturated to mainstream American culture, those who were less acculturated were less likely to endorse the positive affect items. Our findings support the notion that the way in which people endorse depressive symptoms is substantially influenced by cultural orientation. These findings call into question the common use of simple mean comparisons and a universal cutoff point across diverse cultural groups.
Full Text Available Fajrianthi,1 Rizqy Amelia Zein2 1Department of Industrial and Organizational Psychology, 2Department of Personality and Social Psychology, Faculty of Psychology, Universitas Airlangga, Surabaya, East Java, Indonesia Abstract: This study aimed to develop an emotional intelligence (EI test that is suitable to the Indonesian workplace context. Airlangga Emotional Intelligence Test (Tes Kecerdasan Emosi Airlangga [TKEA] was designed to measure three EI domains: 1 emotional appraisal, 2 emotional recognition, and 3 emotional regulation. TKEA consisted of 120 items with 40 items for each subset. TKEA was developed based on the Situational Judgment Test (SJT approach. To ensure its psychometric qualities, categorical confirmatory factor analysis (CCFA and item response theory (IRT were applied to test its validity and reliability. The study was conducted on 752 participants, and the results showed that test information function (TIF was 3.414 (ability level = 0 for subset 1, 12.183 for subset 2 (ability level = -2, and 2.398 for subset 3 (level of ability = -2. It is concluded that TKEA performs very well to measure individuals with a low level of EI ability. It is worth to note that TKEA is currently at the development stage; therefore, in this study, we investigated TKEA’s item analysis and dimensionality test of each TKEA subset. Keywords: categorical confirmatory factor analysis, emotional intelligence, item response theory
Rahmaniar, Andinisa; Rusnayati, Heni; Sutiadi, Asep
While solving physics problem particularly in force matter, it is needed to have the ability of constructing free body diagrams which can help students to analyse every force which acts on an object, the length of its vector and the naming of its force. Mix method was used to explain the result without any special treatment to participants. The participants were high school students in first grade totals 35 students. The purpose of this study is to identify students' ability level of constructing free body diagrams in solving restricted and structured response items. Considering of two types of test, every student would be classified into four levels ability of constructing free body diagrams which is every level has different characteristic and some students were interviewed while solving test in order to know how students solve the problem. The result showed students' ability of constructing free body diagrams on restricted response items about 34.86% included in no evidence of level, 24.11% inadequate level, 29.14% needs improvement level and 4.0% adequate level. On structured response items is about 16.59% included no evidence of level, 23.99% inadequate level, 36% needs improvement level, and 13.71% adequate level. Researcher found that students who constructed free body diagrams first and constructed free body diagrams correctly were more successful in solving restricted and structured response items.
Dragon, Wendy R.; Ben-Porath, Yossef S.; Handel, Richard W.
This article examined the impact of unscorable item responses on the psychometric validity and practical interpretability of scores on the Restructured Clinical (RC) Scales of the Minnesota Multiphasic Personality Inventory-2/Minnesota Multiphasic Personality Inventory-2-Restructured Form (MMPI-2/MMPI-2-RF). In analyses conducted with five…
Deng, Weiling; Monfils, Lora
Using simulated data, this study examined the impact of different levels of stringency of the valid case inclusion criterion on item response theory (IRT)-based true score equating over 5 years in the context of K-12 assessment when growth in student achievement is expected. Findings indicate that the use of the most stringent inclusion criterion…
Makransky, Guido; Dale, Philip S.; Havmose, Philip; Bleses, Dorthe
Purpose: This study investigated the feasibility and potential validity of an item response theory (IRT)-based computerized adaptive testing (CAT) version of the MacArthur-Bates Communicative Development Inventory: Words & Sentences (CDI:WS; Fenson et al., 2007) vocabulary checklist, with the objective of reducing length while maintaining…
Petscher, Yaacov; Mitchell, Alison M.; Foorman, Barbara R.
A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is…
Carlson, James E.; von Davier, Matthias
Few would doubt that ETS researchers have contributed more to the general topic of item response theory (IRT) than individuals from any other institution. In this report, we briefly review most of those contributions, dividing them into sections by decades of publication, beginning with early work by Fred Lord and Bert Green in the 1950s and…
Egberink, I.J.L.; Meijer, R.R.
The authors investigated the psychometric properties of the subscales of the Self-Perception Profile for Children with item response theory (IRT) models using a sample of 611 children. Results from a nonparametric Mokken analysis and a parametric IRT approach for boys (n = 268) and girls (n = 343)
Egberink, Iris J. L.; Meijer, Rob R.
The authors investigated the psychometric properties of the subscales of the Self-Perception Profile for Children with item response theory (IRT) models using a sample of 611 children. Results from a nonparametric Mokken analysis and a parametric IRT approach for boys (n = 268) and girls (n = 343)
Investigating Robustness of Item Response Theory Proficiency Estimators to Atypical Response Behaviors under Two-Stage Multistage Testing. ETS GRE® Board Research Report. ETS GRE®-16-03. ETS Research Report No. RR-16-22
Kim, Sooyeon; Moses, Tim
The purpose of this study is to evaluate the extent to which item response theory (IRT) proficiency estimation methods are robust to the presence of aberrant responses under the "GRE"® General Test multistage adaptive testing (MST) design. To that end, a wide range of atypical response behaviors affecting as much as 10% of the test items…
Bolcic-Jankovic, Dragana; Lu, Fengxin; Colten, Mary Ellen; McCarthy, Ellen P
We report the results from cognitive interviews with Asian American patients and their caregivers. We interviewed seven caregivers and six patients who were all bilingual Asian Americans. The main goal of the cognitive interviews was to test a survey instrument developed for a study about perspectives of Asian American patients with advanced cancer who are facing decisions around end-of-life care. We were particularly interested to see whether items commonly used in White and Black populations are culturally meaningful and equivalent in Asian populations, primarily those of Chinese and Vietnamese ethnicity. Our exploration shows that understanding respondents' language proficiency, degree of acculturation, and cultural context of receiving, processing, and communicating information about medical care can help design questions that are appropriate for Asian American patients and caregivers, and therefore can help researchers obtain quality data about the care Asian American cancer patients receive. © The Author(s) 2016.
Charles Robert Ehlschlaeger
Full Text Available This paper presents a methodology of mapping population-centric social, infrastructural, and environmental metrics at neighborhood scale. This methodology extends traditional survey analysis methods to create cartographic products useful in agent-based modeling and geographic information analysis. It utilizes and synthesizes survey microdata, sub-upazila attributes, land use information, and ground truth locations of attributes to create neighborhood scale multi-attribute maps. Monte Carlo methods are employed to combine any number of survey responses to stochastically weight survey cases and to simulate survey cases' locations in a study area. Through such Monte Carlo methods, known errors from each of the input sources can be retained. By keeping individual survey cases as the atomic unit of data representation, this methodology ensures that important covariates are retained and that ecological inference fallacy is eliminated. These techniques are demonstrated with a case study from the Chittagong Division in Bangladesh. The results provide a population-centric understanding of many social, infrastructural, and environmental metrics desired in humanitarian aid and disaster relief planning and operations wherever long term familiarity is lacking. Of critical importance is that the resulting products have easy to use explicit representation of the errors and uncertainties of each of the input sources via the automatically generated summary statistics created at the application's geographic scale.
Couvy-Duchesne, Baptiste; Davenport, Tracey A; Martin, Nicholas G; Wright, Margaret J; Hickie, Ian B
The Somatic and Psychological HEalth REport (SPHERE) is a 34-item self-report questionnaire that assesses symptoms of mental distress and persistent fatigue. As it was developed as a screening instrument for use mainly in primary care-based clinical settings, its validity and psychometric properties have not been studied extensively in population-based samples. We used non-parametric Item Response Theory to assess scale validity and item properties of the SPHERE-34 scales, collected through four waves of the Brisbane Longitudinal Twin Study (N = 1707, mean age = 12, 51% females; N = 1273, mean age = 14, 50% females; N = 1513, mean age = 16, 54% females, N = 1263, mean age = 18, 56% females). We estimated the heritability of the new scores, their genetic correlation, and their predictive ability in a sub-sample (N = 1993) who completed the Composite International Diagnostic Interview. After excluding items most responsible for noise, sex or wave bias, the SPHERE-34 questionnaire was reduced to 21 items (SPHERE-21), comprising a 14-item scale for anxiety-depression and a 10-item scale for chronic fatigue (3 items overlapping). These new scores showed high internal consistency (alpha > 0.78), moderate three months reliability (ICC = 0.47-0.58) and item scalability (Hi > 0.23), and were positively correlated (phenotypic correlations r = 0.57-0.70; rG = 0.77-1.00). Heritability estimates ranged from 0.27 to 0.51. In addition, both scores were associated with later DSM-IV diagnoses of MDD, social anxiety and alcohol dependence (OR in 1.23-1.47). Finally, a post-hoc comparison showed that several psychometric properties of the SPHERE-21 were similar to those of the Beck Depression Inventory. The scales of SPHERE-21 measure valid and comparable constructs across sex and age groups (from 9 to 28 years). SPHERE-21 scores are heritable, genetically correlated and show good predictive ability of mental health in an Australian-based population
Akinyomi, Oladele John
Based on stakeholders’ theory, this study examined the practice of corporate social responsibility by manufacturing companies in Nigeria. It employed survey research design to study 15 randomly selected companies in the food and beverages sector. A total of 225 questionnaires were administered to collect data. Data analysis revealed that CSR is a familiar concept in the sector as most of the companies do engage in CSR activities regularly. The major areas of focus of the CSR activities includ...
Are symptom features of depression during pregnancy, the postpartum period and outside the peripartum period distinct? Results from a nationally representative sample using item response theory (IRT).
Hoertel, Nicolas; López, Saioa; Peyre, Hugo; Wall, Melanie M; González-Pinto, Ana; Limosin, Frédéric; Blanco, Carlos
Whether there are systematic differences in depression symptom expression during pregnancy, the postpartum period and outside these periods (i.e., outside the peripartum period) remains debated. The aim of this study was to use methods based on item response theory (IRT) to examine, after equating for depression severity, differences in the likelihood of reporting DSM-IV symptoms of major depressive episode (MDE) in women of childbearing age (i.e., aged 18-50) during pregnancy, the postpartum period and outside the peripartum period. We conducted these analyses using a large, nationally representative sample of women of childbearing age from the United States (n = 11,256) who participated in the second wave of the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC). The overall 12-month prevalence of all depressive criteria (except for worthlessness/guilt) was significantly lower in pregnant women than in women of childbearing age outside the peripartum period, whereas the prevalence of all symptoms (except for "psychomotor symptoms") was not significantly different between the postpartum and the nonperipartum group. There were no clinically significant differences in the endorsement rates of symptoms of MDE by pregnancy status when equating for levels of depression severity. This study suggests that the clinical presentation of depressive symptoms in women of childbearing age does not differ during pregnancy, the postpartum period and outside the peripartum period. These findings do not provide psychometric support for the inclusion of the peripartum onset specifier for major depressive disorder in the DSM-5. © 2014 Wiley Periodicals, Inc.
Anderson, Ariana E; Reise, Steven P; Marder, Stephen R; Mansolf, Maxwell; Han, Carol; Bilder, Robert M
Objective: Total scale scores derived by summing ratings from the 30-item PANSS are commonly used in clinical trial research to measure overall symptom severity, and percentage reductions in the total scores are sometimes used to document the efficacy of treatment. Acknowledging that some patients may have substantial changes in PANSS total scores but still be sufficiently symptomatic to warrant diagnosis, ratings on a subset of 8 items, referred to here as the "Remission set," are sometimes used to determine if patients' symptoms no longer satisfy diagnostic criteria. An unanswered question remains: is the goal of treatment better conceptualized as reduction in overall symptom severity, or reduction in symptoms below the threshold for diagnosis? We evaluated the psychometric properties of PANSS total scores, to assess whether having low symptom severity post-treatment is equivalent to attaining Remission. Design: We applied a bifactor item response theory (IRT) model to post-treatment PANSS ratings of 3,647 subjects diagnosed with schizophrenia assessed at the termination of 11 clinical trials. The bifactor model specified one general dimension to reflect overall symptom severity, and five domain-specific dimensions. We assessed how PANSS item discrimination and information parameters varied across the range of overall symptom severity (θ), with a special focus on low levels of symptoms (i.e., θexpected PANSS item score of 1.83, a rating between "Absent" and "Minimal" for a PANSS symptom. Results: The application of the bifactor IRT model revealed: (1) 88% of total score variation was attributable to variation in general symptom severity, and only 8% reflected secondary domain factors. This implies that a general factor may provide a good indicator of symptom severity, and that interpretation is not overly complicated by multidimensionality; (2) Post-treatment, 534 individuals (about 15% of the whole sample) scored in the "Relief" range of general symptom
Background The occurrence of response shift (RS) in longitudinal health-related quality of life (HRQoL) studies, reflecting patient adaptation to disease, has already been demonstrated. Several methods have been developed to detect the three different types of response shift (RS), i.e. recalibration RS, 2) reprioritization RS, and 3) reconceptualization RS. We investigated two complementary methods that characterize the occurrence of RS: factor analysis, comprising Principal Component Analysis (PCA) and Multiple Correspondence Analysis (MCA), and a method of Item Response Theory (IRT). Methods Breast cancer patients (n = 381) completed the EORTC QLQ-C30 and EORTC QLQ-BR23 questionnaires at baseline, immediately following surgery, and three and six months after surgery, according to the “then-test/post-test” design. Recalibration was explored using MCA and a model of IRT, called the Linear Logistic Model with Relaxed Assumptions (LLRA) using the then-test method. Principal Component Analysis (PCA) was used to explore reconceptualization and reprioritization. Results MCA highlighted the main profiles of recalibration: patients with high HRQoL level report a slightly worse HRQoL level retrospectively and vice versa. The LLRA model indicated a downward or upward recalibration for each dimension. At six months, the recalibration effect was statistically significant for 11/22 dimensions of the QLQ-C30 and BR23 according to the LLRA model (p ≤ 0.001). Regarding the QLQ-C30, PCA indicated a reprioritization of symptom scales and reconceptualization via an increased correlation between functional scales. Conclusions Our findings demonstrate the usefulness of these analyses in characterizing the occurrence of RS. MCA and IRT model had convergent results with then-test method to characterize recalibration component of RS. PCA is an indirect method in investigating the reprioritization and reconceptualization components of RS. PMID:24606836
Cheng, Su-Fen; Lee-Hsieh, Jane; Turton, Michael A; Lin, Kuan-Chia
Little research has investigated the establishment of norms for nursing students' self-directed learning (SDL) ability, recognized as an important capability for professional nurses. An item response theory (IRT) approach was used to establish norms for SDL abilities valid for the different nursing programs in Taiwan. The purposes of this study were (a) to use IRT with a graded response model to reexamine the SDL instrument, or the SDLI, originally developed by this research team using confirmatory factor analysis and (b) to establish SDL ability norms for the four different nursing education programs in Taiwan. Stratified random sampling with probability proportional to size was used. A minimum of 15% of students from the four different nursing education degree programs across Taiwan was selected. A total of 7,879 nursing students from 13 schools were recruited. The research instrument was the 20-item SDLI developed by Cheng, Kuo, Lin, and Lee-Hsieh (2010). IRT with the graded response model was used with a two-parameter logistic model (discrimination and difficulty) for the data analysis, calculated using MULTILOG. Norms were established using percentile rank. Analysis of item information and test information functions revealed that 18 items exhibited very high discrimination and two items had high discrimination. The test information function was higher in this range of scores, indicating greater precision in the estimate of nursing student SDL. Reliability fell between .80 and .94 for each domain and the SDLI as a whole. The total information function shows that the SDLI is appropriate for all nursing students, except for the top 2.5%. SDL ability norms were established for each nursing education program and for the nation as a whole. IRT is shown to be a potent and useful methodology for scale evaluation. The norms for SDL established in this research will provide practical standards for nursing educators and students in Taiwan.
Background A widely discussed design issue in patient satisfaction questionnaires is the optimal length and labelling of the answering scale. The aim of the present study was to compare intra-individually the answers on two response scales to five general questions evaluating patients’ perception of hospital care. Methods Between November 2011 and January 2012, all in-hospital patients at a Swiss University Hospital received a patient satisfaction questionnaire on an adjectival scale with three to four labelled categories (LS) and five redundant questions displayed on an 11-point end-anchored numeric scale (NS). The scales were compared concerning ceiling effect, internal consistency (Cronbach’s alpha), individual item answers (Spearman’s rank correlation), and concerning overall satisfaction by calculating an overall percentage score (sum of all answers related to the maximum possible sum). Results The response rate was 41% (2957/7158), of which 2400 (81%) completely filled out all questions. Baseline characteristics of the responders and non-responders were similar. Floor and ceiling effect were high on both response scales, but more pronounced on the LS than on the NS. Cronbach’s alpha was higher on the NS than on the LS. There was a strong individual item correlation between both answering scales in questions regarding the intent to return, quality of treatment and the judgement whether the patient was treated with respect and dignity, but a lower correlation concerning satisfactory information transfer by physicians or nurses, where only three categories were available in the LS. The overall percentage score showed a comparable distribution, but with a wider spread of lower satisfaction in the NS. Conclusions Since the longer scale did not substantially reduce the ceiling effect, the type of questions rather than the type of answering scale could be addressed with a focus on specific questions about concrete situations instead of general questions
Cao, Rui; Nosofsky, Robert M.; Shiffrin, Richard M.
In short-term-memory (STM)-search tasks, observers judge whether a test probe was present in a short list of study items. Here we investigated the long-term learning mechanisms that lead to the highly efficient STM-search performance observed under conditions of consistent-mapping (CM) training, in which targets and foils never switch roles across…
Abed, Eman Rasmi; Al-Absi, Mohammad Mustafa; Abu shindi, Yousef Abdelqader
The purpose of the present study is developing a test to measure the numerical ability for students of education. The sample of the study consisted of (504) students from 8 universities in Jordan. The final draft of the test contains 45 items distributed among 5 dimensions. The results revealed that acceptable psychometric properties of the test;…
Bjorner, J. B.; Petersen, M. Aa; Groenvold, M.; Aaronson, N.; Ahlner-Elmqvist, M.; Arraras, J. I.; Brédart, A.; Fayers, P.; Jordhoy, M.; Sprangers, M.; Watson, M.; Young, T.
Background: As part of a larger study whose objective is to develop an abbreviated version of the EORTC QLQ-C30 suitable for research in palliative care, analyses were conducted to determine the feasibility of generating a shorter version of the 4-item emotional functioning (EF) scale that could be
Porter, Stephen R.; Whitcomb, Michael E.
What causes a student to participate in a survey? This paper looks at participation across multiple surveys to understand survey non-response; by using multiple surveys we minimize the impact of survey salience. Students at a selective liberal arts college were administered four different surveys throughout the 2002-2003 academic year, and we use…
Young, Eric; Stickrath, Chad; McNulty, Monica C; Calderon, Aaron J; Chapman, Elizabeth; Gonzalo, Jed D; Kuperman, Ethan F; Lopez, Max; Smith, Christopher J; Sweigart, Joseph R; Theobald, Cecelia N; Burke, Robert E
Medical residents are routinely entrusted with transitions of care, yet little is known about the duration or content of their perceived responsibility for patients they discharge from the hospital. To examine the duration and content of internal medicine residents' perceived responsibility for patients they discharge from the hospital. The secondary objective was to determine whether specific individual experiences and characteristics correlate with perceived responsibility. Multi-site, cross-sectional 24-question survey delivered via email or paper-based form. Internal medicine residents (post-graduate years 1-3) at nine university and community-based internal medicine training programs in the United States. Perceived responsibility for patients after discharge as measured by a previously developed single-item tool for duration of responsibility and novel domain-specific questions assessing attitudes towards specific transition of care behaviors. Of 817 residents surveyed, 469 responded (57.4 %). One quarter of residents (26.1 %) indicated that their responsibility for patients ended at discharge, while 19.3 % reported perceived responsibility extending beyond 2 weeks. Perceived duration of responsibility did not correlate with level of training (P = 0.57), program type (P = 0.28), career path (P = 0.12), or presence of burnout (P = 0.59). The majority of residents indicated they were responsible for six of eight transitional care tasks (85.1-99.3 % strongly agree or agree). Approximately half of residents (57 %) indicated that it was their responsibility to directly contact patients' primary care providers at discharge. and 21.6 % indicated that it was their responsibility to ensure that patients attended their follow-up appointments. Internal medicine residents demonstrate variability in perceived duration of responsibility for recently discharged patients. Neither the duration nor the content of residents' perceived responsibility was
A loglinear item response theory (IRT) model is proposed that relates polytomously scored item responses to a multidimensional latent space. Each item may have a different response function where each item response may be explained by one or more latent traits. Item response functions may follow a
Item response theory (IRT) is a framework for modeling and analyzing item response ... data. Though, there is an argument that the evaluation of fit in IRT modeling has been ... National Council on Measurement in Education ... model data fit should be based on three types of ... prediction should be assessed through the.
Zampetakis, Leonidas A.; Lerakis, Manolis; Kafetsios, Konstantinos; Moustakis, Vassilis
In the present research we used item response theory (IRT) to examine whether effective predictions (anticipated affect) conforms to a typical (i.e., what people usually do) or a maximal behavior process (i.e., what people can do). The former, correspond to non-monotonic ideal point IRT models whereas the latter correspond to monotonic dominance IRT models. A convenience, cross-sectional student sample (N=1624) was used. Participants were asked to report on anticipated positive and negative a...
Psoter, Walter J; Glotzer, David L; Weiserbs, Kera Fay; Baek, Linda S; Karloopia, Rajiv
A study was completed to assess the academic and state-level professional optometry leadership views regarding optometry professionals as surge responders in the event of a catastrophic event. A cross-sectional survey was conducted using a 21-question, self-administered, structured questionnaire. All U.S. optometry school deans and state optometric association presidents were mailed a questionnaire and instructions to return it by mail on completion; 2 repeated mailings were made. Descriptive statistics were produced and differences between deans and association presidents were tested by Fisher exact test. The questionnaire response rate was 50% (25 returned/50 sent) for the state association presidents and 65% (11/17) for the deans. There were no statistically significant differences between the leadership groups for any survey questions. All agreed that optometrists have the skills, are ethically obligated to help, and that optometrists should receive additional training for participation in disaster response. There was general agreement that optometrists should provide first-aid, obtain medical histories, triage, maintain infection control, manage a point of distribution, prescribe medications, and counsel the "worried well." Starting intravenous lines, interpreting radiographs, and suturing were less favorably supported. There was some response variability between the 2 leadership groups regarding potential sources for training. The overall opinion of optometry professional leadership is that with additional training, optometrists can and should provide an important reserve pool of catastrophic event responders. Copyright © 2011 American Optometric Association. Published by Elsevier Inc. All rights reserved.
In September 2004 the Nordic Council of Ministers asked Nordel to perform some tasks and present the results to the Council on 1 March 2005. One of the tasks is to survey how system responsibility is defined and executed in the different Nordic countries. According to the Nordic Council of Ministers, the survey shall illuminate similarities and differences between the countries and assess the reasons for the differences. Nordel is asked to present a joint view system responsibility in the Nordic countries. Among other things, the responsibility for the system operators and the participants in the market shall be defined. The definition shall also include the distribution of costs between costs for network business and costs for business in competition. This shall be done in a way that creates a common platform for the further harmonisation work and continuous positive development of the Nordic electricity market. It is also important to identify the need for changes in e.g. legislation and guidelines in the different countries as a consequence of an implementation of a common definition in the Nordic countries. Areas to be included in the task are among others, balance settlement, security of supply, congestion management and system services. (BA)
Fajrianthi; Zein, Rizqy Amelia
This study aimed to develop an emotional intelligence (EI) test that is suitable to the Indonesian workplace context. Airlangga Emotional Intelligence Test (Tes Kecerdasan Emosi Airlangga [TKEA]) was designed to measure three EI domains: 1) emotional appraisal, 2) emotional recognition, and 3) emotional regulation. TKEA consisted of 120 items with 40 items for each subset. TKEA was developed based on the Situational Judgment Test (SJT) approach. To ensure its psychometric qualities, categorical confirmatory factor analysis (CCFA) and item response theory (IRT) were applied to test its validity and reliability. The study was conducted on 752 participants, and the results showed that test information function (TIF) was 3.414 (ability level = 0) for subset 1, 12.183 for subset 2 (ability level = -2), and 2.398 for subset 3 (level of ability = -2). It is concluded that TKEA performs very well to measure individuals with a low level of EI ability. It is worth to note that TKEA is currently at the development stage; therefore, in this study, we investigated TKEA's item analysis and dimensionality test of each TKEA subset.
Evaluation of mode equivalence of the MSKCC Bowel Function Instrument, LASA Quality of Life, and Subjective Significance Questionnaire items administered by Web, interactive voice response system (IVRS), and paper.
Bennett, Antonia V; Keenoy, Kathleen; Shouery, Marwan; Basch, Ethan; Temple, Larissa K
To assess the equivalence of patient-reported outcome (PRO) survey responses across Web, interactive voice response system (IVRS), and paper modes of administration. Postoperative colorectal cancer patients with home Web/e-mail and phone were randomly assigned to one of the eight study groups: Groups 1-6 completed the survey via Web, IVRS, and paper, in one of the six possible orders; Groups 7-8 completed the survey twice, either by Web or by IVRS. The 20-item survey, including the MSKCC Bowel Function Instrument (BFI), the LASA Quality of Life (QOL) scale, and the Subjective Significance Questionnaire (SSQ) adapted to bowel function, was completed from home on consecutive days. Mode equivalence was assessed by comparison of mean scores across modes and intraclass correlation coefficients (ICCs) and was compared to the test-retest reliability of Web and IVRS. Of 170 patients, 157 completed at least one survey and were included in analysis. Patients had mean age 56 (SD = 11), 53% were male, 81% white, 53% colon, and 47% rectal cancer; 78% completed all assigned surveys. Mean scores for BFI total score, BFI subscale scores, LASA QOL, and adapted SSQ varied by mode by less than one-third of a score point. ICCs across mode were: BFI total score (Web-paper = 0.96, Web-IVRS = 0.97, paper-IVRS = 0.97); BFI subscales (range = 0.88-0.98); LASA QOL (Web-paper = 0.98, Web-IVRS = 0.78, paper-IVRS = 0.80); and SSQ (Web-paper = 0.92, Web-IVRS = 0.86, paper-IVRS = 0.79). Mode equivalence was demonstrated for the BFI total score, BFI subscales, LASA QOL, and adapted SSQ, supporting the use of multiple modes of PRO data capture in clinical trials.
Full Text Available Although 80% of persons with disabilities live in low and middle-income countries, there is still a lack of comprehensive, cross-culturally validated tools to identify persons facing activity limitations and functioning difficulties in these settings. In absence of such a tool, disability estimates vary considerably according to the methodology used, and policies are based on unreliable estimates.The Disability Screening Questionnaire composed of 27 items (DSQ-27 was initially designed by a group of international experts in survey development and disability in Afghanistan for a national survey. Items were selected based on major domains of activity limitations and functioning difficulties linked to an impairment as defined by the International Classification of Functioning, Disability and Health. Face, content and construct validity, as well as sensitivity and specificity were examined. Based on the results obtained, the tool was subsequently refined and expanded to 34 items, tested and validated in Darfur, Sudan. Internal consistency for the total DSQ-34 using a raw and standardized Cronbach's Alpha and within each domain using a standardized Cronbach's Alpha was examined in the Asian context (India and Nepal. Exploratory factor analysis (EFA using principal axis factoring (PAF evaluated the lowest number of factors to account for the common variance among the questions in the screen. Test-retest reliability was determined by calculating intraclass correlation (ICC and inter-rater reliability by calculating the kappa statistic; results were checked using Bland-Altman plots. The DSQ-34 was further tested for standard error of measurement (SEM and for the minimum detectable change (MDC. Good internal consistency was indicated by Cronbach's Alpha of 0.83/0.82 for India and 0.76/0.78 for Nepal. We confirmed our assumption for EFA using the Kaiser-Meyer-Olkin measure of sampling well above the accepted cutoff of 0.40 for India (0.82 and Nepal (0
Trani, Jean-François; Babulal, Ganesh Muneshwar; Bakhshi, Parul
Although 80% of persons with disabilities live in low and middle-income countries, there is still a lack of comprehensive, cross-culturally validated tools to identify persons facing activity limitations and functioning difficulties in these settings. In absence of such a tool, disability estimates vary considerably according to the methodology used, and policies are based on unreliable estimates. The Disability Screening Questionnaire composed of 27 items (DSQ-27) was initially designed by a group of international experts in survey development and disability in Afghanistan for a national survey. Items were selected based on major domains of activity limitations and functioning difficulties linked to an impairment as defined by the International Classification of Functioning, Disability and Health. Face, content and construct validity, as well as sensitivity and specificity were examined. Based on the results obtained, the tool was subsequently refined and expanded to 34 items, tested and validated in Darfur, Sudan. Internal consistency for the total DSQ-34 using a raw and standardized Cronbach's Alpha and within each domain using a standardized Cronbach's Alpha was examined in the Asian context (India and Nepal). Exploratory factor analysis (EFA) using principal axis factoring (PAF) evaluated the lowest number of factors to account for the common variance among the questions in the screen. Test-retest reliability was determined by calculating intraclass correlation (ICC) and inter-rater reliability by calculating the kappa statistic; results were checked using Bland-Altman plots. The DSQ-34 was further tested for standard error of measurement (SEM) and for the minimum detectable change (MDC). Good internal consistency was indicated by Cronbach's Alpha of 0.83/0.82 for India and 0.76/0.78 for Nepal. We confirmed our assumption for EFA using the Kaiser-Meyer-Olkin measure of sampling well above the accepted cutoff of 0.40 for India (0.82) and Nepal (0.82). The
Shou, Juan; Ren, Limin; Wang, Haitang; Yan, Fei; Cao, Xiaoyun; Wang, Hui; Wang, Zhiliang; Zhu, Shanzhu; Liu, Yao
The 12-item Short-Form Health Survey (SF-12) is the abridged practical version of SF-36. This cross-sectional study was aimed to assess the reliability and validity of SF-12 for the health status of Chinese community elderly population. The Chinese community elderly people in Xujiahui district of Shanghai were investigated. The internal consistency reliability was assessed using Cronbach's alpha and split-half reliability coefficients. Construct validity was analyzed using exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). Spearman's correlation coefficient (ρ) was used for the evaluation of criterion, convergent, and discriminant validity with Spearman's ρ ≥ 0.4 as satisfactory. Comparisons of the SF-12 summary scores among populations that differed in demographics were performed for discriminant validity. Total 1343 individuals aged ≥60 and reliability coefficient (0.812) reflected satisfactory internal consistency reliability of SF-12. EFA extracted a two-factor model (physical and mental health). About 60.7 % of the total variance was explained by the two factors. CFA showed that the two-factor solution provided a good fit to the data. Good convergent validity and discriminant validity of SF-12 were proved by the correction analyses (Spearman's ρ > 0.4) and the comparisons of the SF-12 summary scores among populations (P 0.4, P reliability and validity in measuring health status of Chinese community elderly population in Xujiahui district of Shanghai.
Assumpção, Ana; Pagano, Tatiana; Matsutani, Luciana A; Ferreira, Elizabeth A G; Pereira, Carlos A B; Marques, Amélia P
Fibromyalgia is a painful syndrome characterized by widespread chronic pain and associated symptoms with a negative impact on quality of life. Considering the subjectivity of quality of life measurements, the aim of this study was to verify the discriminating power of two quality of life questionnaires in patients with fibromyalgia: the generic Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36) and the specific Fibromyalgia Impact Questionnaire (FIQ). A cross-sectional study was conducted on 150 participants divided into Fibromyalgia Group (FG) and Control Group (CG) (n=75 in each group). The participants were evaluated using the SF-36 and the FIQ. The data were analyzed by the Student t-test (α=0.05) and inferential analysis using the Receiver Operating Characteristics (ROC) Curve--sensitivity, specificity and area under the curve (AUC). The significance level was 0.05. The sample was similar for age (CG: 47.8 ± 8.1; FG: 47.0 ± 7.7 years). A significant difference was observed in quality of life assessment in all aspects of both questionnaires (pquality of life in fibromyalgia patients, and we suggest that both should be used in parallel because they evaluate relevant and complementary aspects of quality of life.
Jacob, Robin Tepper; Jacob, Brian
Teacher and principal surveys are among the most common data collection techniques employed in education research. Yet there is remarkably little research on survey methods in education, or about the most cost-effective way to raise response rates among teachers and principals. In an effort to explore various methods for increasing survey response…
Nov 5, 2014 ... Key words: Classical test theory, item analysis, item difficulty, item discrimination, item response theory, reliability ... the probability of answering an item correctly or of attaining ..... A Monte Carlo comparison of item and person.
Axinn, William G; Link, Cynthia F; Groves, Robert M
To address declining response rates and rising data-collection costs, survey methodologists have devised new techniques for using process data ("paradata") to address nonresponse by altering the survey design dynamically during data collection. We investigate the substantive consequences of responsive survey design-tools that use paradata to improve the representative qualities of surveys and control costs. By improving representation of reluctant respondents, responsive design can change our understanding of the topic being studied. Using the National Survey of Family Growth Cycle 6, we illustrate how responsive survey design can shape both demographic estimates and models of demographic behaviors based on survey data. By juxtaposing measures from regular and responsive data collection phases, we document how special efforts to interview reluctant respondents may affect demographic estimates. Results demonstrate the potential of responsive survey design to change the quality of demographic research based on survey data.
Tasca, Giorgio A; Cabrera, Christine; Kristjansson, Elizabeth; MacNair-Semands, Rebecca; Joyce, Anthony S; Ogrodniczuk, John S
We tested a very brief version of the 23-item Therapeutic Factors Inventory-Short Form (TFI-S), and describe the use of Item Response Theory (IRT) for the purpose of developing short and reliable scales for group psychotherapy. Group therapy patients (N = 578) completed the TFI-S on one occasion, and their data were used for the IRT analysis. Of those, 304 completed the TFI-S and other measures on more than one occasion to assess sensitivity to change, concurrent, and predictive validity of the brief version. Results suggest that the new TFI-8 is a brief, reliable, and valid measure of a higher-order group therapeutic factor. The TFI-8 may be used for continuous process measurement and feedback to improve the functioning of therapy groups.
Buatois, Simon; Retout, Sylvie; Frey, Nicolas; Ueckert, Sebastian
This manuscript aims to precisely describe the natural disease progression of Parkinson's disease (PD) patients and evaluate approaches to increase the drug effect detection power. An item response theory (IRT) longitudinal model was built to describe the natural disease progression of 423 de novo PD patients followed during 48 months while taking into account the heterogeneous nature of the MDS-UPDRS. Clinical trial simulations were then used to compare drug effect detection power from IRT and sum of item scores based analysis under different analysis endpoints and drug effects. The IRT longitudinal model accurately describes the evolution of patients with and without PD medications while estimating different progression rates for the subscales. When comparing analysis methods, the IRT-based one consistently provided the highest power. IRT is a powerful tool which enables to capture the heterogeneous nature of the MDS-UPDRS.
Trespalacios, Jesús H.; Perkins, Ross A.
Individual strategies to increase response rate and survey completion have been extensively researched. Recently, efforts have been made to investigate a combination of interventions to yield better response rates for web-based surveys. This study examined the effects of four different survey invitation conditions on response rate. From a large…
Zampetakis, Leonidas A; Lerakis, Manolis; Kafetsios, Konstantinos; Moustakis, Vassilis
In the present research, we used item response theory (IRT) to examine whether effective predictions (anticipated affect) conforms to a typical (i.e., what people usually do) or a maximal behavior process (i.e., what people can do). The former, correspond to non-monotonic ideal point IRT models, whereas the latter correspond to monotonic dominance IRT models. A convenience, cross-sectional student sample (N = 1624) was used. Participants were asked to report on anticipated positive and negative affect around a hypothetical event (emotions surrounding the start of a new business). We carried out analysis comparing graded response model (GRM), a dominance IRT model, against generalized graded unfolding model, an unfolding IRT model. We found that the GRM provided a better fit to the data. Findings suggest that the self-report responses to anticipated affect conform to dominance response process (i.e., maximal behavior). The paper also discusses implications for a growing literature on anticipated affect.
von Kobyletzki, Laura B; Thomas, Kim S; Schmitt, Jochen; Chalmers, Joanne R; Deckert, Stefanie; Aoki, Valeria; Weisshaar, Elke; Ojo, Jumoke Ahubelem; Svensson, Åke
This study investigated the perspective of international patients on individual symptoms of atopic dermatitis (eczema) in determining treatment response. A questionnaire was developed to evaluate the importance of symptoms from the patient's perspective. Patients were asked: "How important are these features in deciding whether or not a treatment is working?", and rated symptoms on a 5-point Likert scale. Patients were approached via Harmonising Outcome Measures for Eczema (HOME) collaborators and self-selected to take part in the on-line survey. Patients from 34 countries (n = 1,111) completed the survey; of these, 423 (38.3%) were parents of children with eczema. Nine items were rated as being "quite important" or "very important" by more than 80% of the respondents: itch, pain/soreness, skin feels hot or inflamed, bleeding, involvement of visible or sensitive body sites, cracks, sleep difficulties, amount of body affected, and weeping/oozing. These results may be of use in determining the face validity of scales from a cross-cultural patients' perspective.
Laddu, Deepika R; Wertheim, Betsy C; Garcia, David O; Woods, Nancy F; LaMonte, Michael J; Chen, Bertha; Anton-Culver, Hoda; Zaslavsky, Oleg; Cauley, Jane A; Chlebowski, Rowan; Manson, JoAnn E; Thomson, Cynthia A; Stefanick, Marcia L
To compare the value of clinically measured gait speed with that of the self-reported Medical Outcomes Study 36-item Short-Form Survey Physical Function Index (SF-36 PF) in predicting future preclinical mobility disability (PCMD) in older women. Prospective cohort study. Forty clinical centers in the United States. Women aged 65 to 79 enrolled in the Women's Health Initiative Clinical Trials with gait speed and SF-36 assessed at baseline (1993-1998) and follow-up Years 1, 3, and 6 (N = 3,587). Women were categorized as nondecliners or decliners based on changes (from baseline to Year 1) in gait speed and SF-36 PF scores. Logistic regression models were used to estimate incident PCMD (gait speed 36 PF with that of measured gait speed. Slower baseline gait speed and lower SF-36 PF scores were associated with higher adjusted odds of PCMD at Years 3 and 6 (all P 36, decliners were 1.42 times as likely to have developed PCMD by Year 3 and 1.49 times as likely by Year 6. Baseline gait speed (AUC = 0.713) was nonsignificantly better than SF-36 (AUC = 0.705) at predicting PCMD over 6 years (P = .21); including measures at a second time point significantly improved model discrimination for predicting PCMD (all P 36 PF did, although the results may be limited given that gait speed served as a predictor and to define the PCMD outcome. Nonetheless, monitoring trajectories of change in mobility are better predictors of future mobility disability than single measures. © 2018, Copyright the Authors Journal compilation © 2018, The American Geriatrics Society.
Reis, Ana Luiza; Reis, Leonardo Oliveira; Saade, Ricardo Destro; Santos, Carlos Alberto; Lima, Marcelo Lopes de; Fregonesi, Adriano
To validate the Quality of Erection Questionnaire (QEQ) considering Brazilian social-cultural aspects. To determine equivalence between the Portuguese and the English QEQ versions, the Portuguese version was back-translated by two professors who are native English speakers. After language equivalence had been determined, urologists considered the QEQ Portuguese version suitable. Men with self-reported erectile dysfunction (ED) and infertile men who had a stable sexual relationship for at least 6 months were invited to answer the QEQ, the International Index of Erectile Function (IIEF) and the RAND 36-Item Health Survey (RAND-36). The questionnaires were presented together and answered without help in a private room. Internal consistency (Cronbach's α), test-retest reliability (Spearman), convergent validity (Spearman correlation) coefficients and known-groups validity (the ability of the QEQ Portuguese version to differentiate erectile dysfunction severity groups) were assessed. We recruited 197 men (167 ED patients and 30 non-ED patients), mean age of 53.3 and median of 55.5 years (23-82 years). The Portuguese version of the QEQ had high internal consistency (Cronbach α=0.93), high stability between test and retest (ICC 0.83, with IC 95%: 0.76-0.88, pPortuguese version presented good psychometric properties and high convergent validity in relation to IIEF. The low correlations between the QEQ and the RAND-36, as well as between the IIEF and the RAND-36 indicated IIEF and QEQ specificity, which may have resulted from the patients' psychological adaptations that minimized the impact of ED on Quality of Life (QoL) and reestablished the well-being feeling.
Fossati, Andrea; Widiger, Thomas A; Borroni, Serena; Maffei, Cesare; Somma, Antonella
To extend the evidence on the reliability and construct validity of the Five-Factor Model Rating Form (FFMRF) in its self-report version, two independent samples of Italian participants, which were composed of 510 adolescent high school students and 457 community-dwelling adults, respectively, were administered the FFMRF in its Italian translation. Adolescent participants were also administered the Italian translation of the Borderline Personality Features Scale for Children-11 (BPFSC-11), whereas adult participants were administered the Italian translation of the Triarchic Psychopathy Measure (TriPM). Cronbach α values were consistent with previous findings; in both samples, average interitem r values indicated acceptable internal consistency for all FFMRF scales. A multidimensional graded item response theory model indicated that the majority of FFMRF items had adequate discrimination parameters; information indices supported the reliability of the FFMRF scales. Both categorical (i.e., item-level) and scale-level regression analyses suggested that the FFMRF scores may predict a nonnegligible amount of variance in the BPFSC-11 total score in adolescent participants, and in the TriPM scale scores in adult participants.
Oliveira-Kumakura, Ana Railka de Souza; de Araujo, Thelma Leite; Costa, Alice Gabrielle de Sousa; Cavalcante, Tahissa Frota; Lopes, Marcos Venícios de Oliveira; Carvalho, Emilia Campos
To validate clinically the nursing outcome "Swallowing status". The adjustment of the nursing outcome was investigated according to the Classical and Item Response Theories. The models were compared regarding information loss, goodness-of-fit, and differential item functioning. Stability and internal consistency were examined. The nursing outcome has the best fit in the generalized partial credit model with different discrimination parameters. Strong correlations among the scores of each indicator were observed. There was no differential item functioning of the outcome indicators. The scale presented high internal consistency (Cronbach's α = .954) and stability (and > .800). This study presents a valid nursing outcome. Most accurate monitoring of sensitivity to an intervention. Validar clinicamente o resultado de enefermagem "Estado da Deglutição". MÉTODOS: O ajustamento do resultado foi investigado de acordo com as teorias Clássica e de Resposta ao Item. Os modelos foram comparados assumindo parâmetros de itens cruzados de igual discriminação. Investigaram-se as propriedades de bondade do ajuste, funcionamento diferencial dos itens, estabilidade e consistência interna. O resultado se ajustou melhor a partir do Modelo de crédito parcial generalizado, o qual demonstrou unidimensionalidade do resultado e forte correlação entre os escores de cada indicador. Não houve funcionamento diferencial dos indicadores. A consistência interna para a escala global (Cronbach's α = .954) e a estabilidade (>.800) mantiveram-se elevadas. CONCLUSÃO: O estudo apresenta um resultado de enfermagem válido. RELEVÂNCIA PARA A PRÁTICA CLÍNICA: Maior acurácia para monitorar a sensibilidade da intervenção. © 2017 NANDA International, Inc.
Chatfield, Sheryl L; Nolan, Rachael; Crawford, Hannah; Hallam, Jeffrey S
Hand hygiene is promoted as an effective practice to counter health care-acquired infections; however, compliance is less than optimal. Nurses have many patient contact opportunities and therefore are frequent participants in intervention research. The optimal combination of efficient and effective intervention components has not been conclusively identified. A factorial survey research design offers an efficient method to assess multiple factors simultaneously by combining elements into vignettes. This article describes a process, grounded in the framework of Bandura's social cognitive theory, that explored environmental and individual factors that potentially influence nurses' hand hygiene behavior in acute care settings. Survey respondents consisted of nurses employed in patient care; respondents also could address an open response item. A total of 466 participants scored a total of 3,685 vignettes. Statistically significant parameters included goal, supervisor priority, electronic monitoring, and rewards. The most frequently mentioned open response item was the need to keep hand hygiene product dispensers refilled. Participants also suggested that culture and intrinsic motivation influenced hand hygiene behavior. Researchers might consider assessing promising factors, especially use of goal setting, as an intervention rather than as components of an intervention. Further research is indicated to better understand how nurses define and view hand hygiene culture. Copyright © 2017 Association for Professionals in Infection Control and Epidemiology, Inc. Published by Elsevier Inc. All rights reserved.
Stochl, Jan; Böhnke, Jan R; Pickett, Kate E; Croudace, Tim J
Recent developments in psychometric modeling and technology allow pooling well-validated items from existing instruments into larger item banks and their deployment through methods of computerized adaptive testing (CAT). Use of item response theory-based bifactor methods and integrative data analysis overcomes barriers in cross-instrument comparison. This paper presents the joint calibration of an item bank for researchers keen to investigate population variations in general psychological distress (GPD). Multidimensional item response theory was used on existing health survey data from the Scottish Health Education Population Survey (n = 766) to calibrate an item bank consisting of pooled items from the short common mental disorder screen (GHQ-12) and the Affectometer-2 (a measure of "general happiness"). Computer simulation was used to evaluate usefulness and efficacy of its adaptive administration. A bifactor model capturing variation across a continuum of population distress (while controlling for artefacts due to item wording) was supported. The numbers of items for different required reliabilities in adaptive administration demonstrated promising efficacy of the proposed item bank. Psychometric modeling of the common dimension captured by more than one instrument offers the potential of adaptive testing for GPD using individually sequenced combinations of existing survey items. The potential for linking other item sets with alternative candidate measures of positive mental health is discussed since an optimal item bank may require even more items than these.
Kunicki, Zachary J; Schick, Melissa R; Spillane, Nichea S; Harlow, Lisa L
Those who binge drink are at increased risk for alcohol-related consequences when compared to non-binge drinkers. Research shows individuals may face barriers to reducing their drinking behavior, but few measures exist to assess these barriers. This study created and validated the Barriers to Alcohol Reduction (BAR) scale. Participants were college students ( n = 230) who endorsed at least one instance of past-month binge drinking (4+ drinks for women or 5+ drinks for men). Using classical test theory, exploratory structural equation modeling found a two-factor structure of personal/psychosocial barriers and perceived program barriers. The sub-factors, and full scale had reasonable internal consistency (i.e., coefficient omega = 0.78 (personal/psychosocial), 0.82 (program barriers), and 0.83 (full measure)). The BAR also showed evidence for convergent validity with the Brief Young Adult Alcohol Consequences Questionnaire ( r = 0.39, p Theory (IRT) analysis showed the two factors separately met the unidimensionality assumption, and provided further evidence for severity of the items on the two factors. Results suggest that the BAR measure appears reliable and valid for use in an undergraduate student population of binge drinkers. Future studies may want to re-examine this measure in a more diverse sample.
During the survey, the respondents are often allowed to tick more than one answer option for a question. Analysis and visualization of such data have difficulties because of the need for processing multiple response variables. With standard representation such as pie and bar charts, information about the association between different answer options is lost. The author proposes a visualization approach for multiple response variables based on Venn diagrams. For a more informative representation with a large number of overlapping groups it is suggested to use similarity and association matrices. Some aggregate indicators of dissimilarity (similarity) are proposed based on the determinant of the similarity matrix and the maximum eigenvalue of association matrix. The application of the proposed approaches is well illustrated by the example of the analysis of advertising sources. Intersection of sets indicates that the same consumer audience is covered by several advertising sources. This information is very important for the allocation of the advertising budget. The differences between target groups in advertising sources are of interest. To identify such differences the hypothesis of homogeneity and independence are tested. Recent approach to the problem are briefly reviewed and compared. An alternative procedure is suggested. It is based on partition of a consumer audience into pairwise disjoint subsets and includes hypothesis testing of the difference between the population proportions. It turned out to be more suitable for the real problem being solved.
Vanessa Rufino da Silva
Full Text Available This study aimed to identify and produce through models of Item Response Theory (IRT a socio-economic indicator based in the items observed in 2000 Census, following the methodology by Soares (2005. By the IRT Methodology, this indicator, as a latent variable, is obtained through the construction of specific models and scales, making it possible to measure this variable, which according to Andrade et al. (2000, IRT analyzes each item which compose the measuring instrument. This case consists of binary or dichotomous items, which assess the possession of certain assets of domestic comfort. The characteristics of each item were analyzed, as the ability to discrimination and income necessary for the possession of certain property. It was concluded that with 13 items, a trustworthy questionnaire can be done for the construction of a socioeconomic index of Maringa’s metropolitan region.
Sikder, Mustafa; Daraz, Umar; Lantagne, Daniele; Saltori, Roberto
Water, sanitation, and hygiene (WASH) are immediate priorities for human survival and dignity in emergencies. In 2010, > 90% of Syrians had access to improved drinking water. In 2011, armed conflict began and currently 12 million people need WASH services. We analyzed data collected in southern Syria to identify effective WASH response activities for this context. Cross-sectional household surveys were conducted in 2016 and 2017 in 17 sub-districts of two governorates in opposition controlled southern Syria. During the survey, household water was tested for free chlorine residual (FCR). Descriptive statistics were calculated, and mixed effect logistic regressions were completed to determine associations between demographic and WASH variables with outcomes of FCR > 0.1 mg/L in household water and reported diarrhea in children market-available hygiene items were unaffordable. FCR > 0.1 mg/L increased from 4.1% to 27.9% over this time, with Water Safety Plan (WSP) programming strongly associated with FCR (mOR: 24.16; 95% CI: 5.93-98.5). The proportion of households with childhood diarrhea declined from 32.8% to 20.4% over this time; sanitation and hygiene access were protective against childhood diarrhea. The private sector has effectively replaced decaying infrastructure in Syria, although at high cost and uncertain quality. Allowing market forces to manage WASH services and quantity, and targeting emergency response activities on increasing affordability with well-targeted subsidies and improving water quality and regulation via WSPs can be an effective, scalable, and cost-effective strategy to guarantee water and sanitation access in protracted emergencies with local markets.
Online surveys are increasingly common due to the myriad of benefits they offer over traditional survey methods. However, research has shown that response rates to web-based surveys are typically lower than to traditional surveys and can possibly yield biased results. University-based faculty members are a unique cohort that may be ideally suited…
Edjolo, Arlette; Proust-Lima, Cécile; Delva, Fleur; Dartigues, Jean-François; Pérès, Karine
We aimed to describe the hierarchical structure of Instrumental Activities of Daily Living (IADL) and basic Activities of Daily Living (ADL) and trajectories of dependency before death in an elderly population using item response theory methodology. Data were obtained from a population-based French cohort study, the Personnes Agées QUID (PAQUID) Study, of persons aged ≥65 years at baseline in 1988 who were recruited from 75 randomly selected areas in Gironde and Dordogne. We evaluated IADL and ADL data collected at home every 2-3 years over a 24-year period (1988-2012) for 3,238 deceased participants (43.9% men). We used a longitudinal item response theory model to investigate the item sequence of 11 IADL and ADL combined into a single scale and functional trajectories adjusted for education, sex, and age at death. The findings confirmed the earliest losses in IADL (shopping, transporting, finances) at the partial limitation level, and then an overlapping of concomitant IADL and ADL, with bathing and dressing being the earliest ADL losses, and finally total losses for toileting, continence, eating, and transferring. Functional trajectories were sex-specific, with a benefit of high education that persisted until death in men but was only transient in women. An in-depth understanding of this sequence provides an early warning of functional decline for better adaptation of medical and social care in the elderly. © The Author 2016. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: firstname.lastname@example.org.
De Caluwé, Elien; Rettew, David C; De Clercq, Barbara
Various studies have shown that obsessive-compulsive symptoms exist as part of not only obsessive-compulsive disorder (OCD) but also obsessive-compulsive personality disorder (OCPD). Despite these shared characteristics, there is an ongoing debate on the inclusion of OCPD into the recently developed DSM-5 obsessive-compulsive and related disorders (OCRDs) category. The current study aims to clarify whether this inclusion can be justified from an item response theory approach. The validity of the continuity model for understanding the association between OCD and OCPD was explored in 787 Dutch community and referred adolescents (70% female, 12-20 years old, mean = 16.16, SD = 1.40) studied between July 2011 and January 2013, relying on item response theory (IRT) analyses of self-reported OCD symptoms (Youth Obsessive-Compulsive Symptoms Scale [YOCSS]) and OCPD traits (Personality Inventory for DSM-5 [PID-5]). The results support the continuity hypothesis, indicating that both OCD and OCPD can be represented along a single underlying spectrum. OCD, and especially the obsessive symptom domain, can be considered as the extreme end of OCPD traits. The current study empirically supports the classification of OCD and OCPD along a single dimension. This integrative perspective in OC-related pathology addresses the dimensional nature of traits and psychopathology and may improve the transparency and validity of assessment procedures. © Copyright 2014 Physicians Postgraduate Press, Inc.
Noerholm, V; Groenvold, M; Watt, T
BACKGROUND: The main objective of this study was to investigate the construct validity of the WHOQOL-BREF by use of Rasch and Item Response Theory models and to examine the stability of the model across high/low scoring individuals, gender, education, and depressive illness. Furthermore......, the objective of the study was to estimate the reference data for the quality of life questionnaire WHOQOL-BREF in the general Danish population and in subgroups defined by age, gender, and education. METHODS: Mail-out-mail-back questionnaires were sent to a randomly selected sample of the Danish general...... population. The response rate was 68.5%, and the sample reported here contained 1101 respondents: 578 women and 519 men (four respondents did not indicate their genders). RESULTS: Each of the four domains of the WHOQOL-BREF scale fitted a two-parameter IRT model, but did not fit the Rasch model. Due...
Kroeger; Ceervantes, Flores; Jones; Wilson
.... Survey Content Topics covered in the survey primarily addressed the post-service employment and earnings experiences of military retirees, with emphasis on the effects of combat- related disabilities...
Scott, N. W.; Fayers, P. M.; Aaronson, N. K.; Bottomley, A.; de Graeff, A.; Groenvold, M.; Koller, M.; Petersen, M. A.; Sprangers, M. A. G.
INTRODUCTION: The European Organisation for Research and Treatment of Cancer (EORTC) QLQ-C30 is a widely used health-related quality of life instrument. The main aim of this study is to investigate whether there are international differences in response to the questionnaire that can be explained by
Chevalier, Nicolas; James, Tiffany D.; Wiebe, Sandra A.; Nelson, Jennifer Mize; Espy, Kimberly Andrews
The present study addressed whether developmental improvement in working memory span task performance relies upon a growing ability to proactively plan response sequences during childhood. Two hundred thirteen children completed a working memory span task in which they used a touchscreen to reproduce orally presented sequences of animal names.…
Kirke, Diana N.; Frucht, Steven J.; Simonyan, Kristina
Laryngeal dystonia (LD) is a task-specific focal dystonia of unknown pathophysiology affecting speech production. We examined the demographics of anecdotally reported alcohol use and its effects on LD symptoms using an online survey based on Research Electronic Data Capture (REDCap™) and National Spasmodic Dysphonia Association’s patient registry. From 641 participants, 531 were selected for data analysis, and 110 were excluded because of unconfirmed diagnosis. A total of 406 patients (76.5%) had LD and 125 (23.5%) had LD and voice tremor (LD/VT). The consumption of alcohol was reported by 374 LD (92.1%) and 109 LD/VT (87.2%) patients. Improvement of voice symptoms after alcohol ingestion was noted by 227 LD (55.9% of all patients) and 73 LD/VT (58.4%), which paralleled the improvement observed by patient’s family and/or friends in 214 LD (57.2%) and 69 LD/VT (63.3%) patients. The benefits lasted 1–3 hours in both groups with the maximum effect after 2 drinks in LD patients (p = 0.002), whereas LD/VT symptoms improved independent of the consumed amount (p = 0.48). Our data suggest that isolated dystonic symptoms, such as in LD, are responsive to alcohol intake and this responsiveness is not attributed to the presence of VT, which is known to have significant benefits from alcohol ingestion. Alcohol may modulate the pathophysiological mechanisms underlying abnormal neurotransmission of γ-aminobutyric acid (GABA) in dystonia and as such provide new avenues for novel therapeutic options in these patients. PMID:25929664
Bruce, Bonnie; Fries, James F; Ambrosini, Debbie; Lingala, Bharathi; Gandek, Barbara; Rose, Matthias; Ware, John E
Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models
Missing data methods for dealing with missing items in quality of life questionnaires. A comparison by simulation of personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques applied to the SF-36 in the French 2003 decennial health survey.
Peyre, Hugo; Leplège, Alain; Coste, Joël
Missing items are common in quality of life (QoL) questionnaires and present a challenge for research in this field. It remains unclear which of the various methods proposed to deal with missing data performs best in this context. We compared personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques using various realistic simulation scenarios of item missingness in QoL questionnaires constructed within the framework of classical test theory. Samples of 300 and 1,000 subjects were randomly drawn from the 2003 INSEE Decennial Health Survey (of 23,018 subjects representative of the French population and having completed the SF-36) and various patterns of missing data were generated according to three different item non-response rates (3, 6, and 9%) and three types of missing data (Little and Rubin's "missing completely at random," "missing at random," and "missing not at random"). The missing data methods were evaluated in terms of accuracy and precision for the analysis of one descriptive and one association parameter for three different scales of the SF-36. For all item non-response rates and types of missing data, multiple imputation and full information maximum likelihood appeared superior to the personal mean score and especially to hot deck in terms of accuracy and precision; however, the use of personal mean score was associated with insignificant bias (relative bias personal mean score appears nonetheless appropriate for dealing with items missing from completed SF-36 questionnaires in most situations of routine use. These results can reasonably be extended to other questionnaires constructed according to classical test theory.
Background Minimising participant non-response in postal surveys helps to maximise the generalisability of the inferences made from the data collected. The aim of this study was to examine the effect of questionnaire length, personalisation and reminder type on postal survey response rate and quality and to compare the cost-effectiveness of the alternative survey strategies. Methods In a pilot study for a population study of travel behaviour, physical activity and the environment, 1000 participants sampled from the UK edited electoral register were randomly allocated using a 2 × 2 factorial design to receive one of four survey packs: a personally addressed long (24 page) questionnaire pack, a personally addressed short (15 page) questionnaire pack, a non-personally addressed long questionnaire pack or a non-personally addressed short questionnaire pack. Those who did not return a questionnaire were stratified by initial randomisation group and further randomised to receive either a full reminder pack or a reminder postcard. The effects of the survey design factors on response were examined using multivariate logistic regression. Results An overall response rate of 17% was achieved. Participants who received the short version of the questionnaire were more likely to respond (OR = 1.48, 95% CI 1.06 to 2.07). In those participants who received a reminder, personalisation of the survey pack and reminder also increased the odds of response (OR = 1.44, 95% CI 1.01 to 1.95). Item non-response was relatively low, but was significantly higher in the long questionnaire than the short (9.8% vs 5.8%; p = .04). The cost per additional usable questionnaire returned of issuing the reminder packs was £23.1 compared with £11.3 for the reminder postcards. Conclusions In contrast to some previous studies of shorter questionnaires, this trial found that shortening a relatively lengthy questionnaire significantly increased the response. Researchers should consider the trade off
Peters, Lorna; Sunderland, Matthew; Andrews, Gavin; Rapee, Ronald M; Mattick, Richard P
Shortened forms of the Social Interaction Anxiety Scale (SIAS) and the Social Phobia Scale (SPS) were developed using nonparametric item response theory methods. Using data from socially phobic participants enrolled in 5 treatment trials (N = 456), 2 six-item scales (the SIAS-6 and the SPS-6) were developed. The validity of the scores on the SIAS-6 and the SPS-6 was then tested using traditional methods for their convergent validity in an independent clinical sample and a student sample, as well as for their sensitivity to change and diagnostic sensitivity in the clinical sample. The scores on the SIAS-6 and the SPS-6 correlated as well as the scores on the original SIAS and SPS, with scores on measures of related constructs, discriminated well between those with and without a diagnosis of social phobia, providing cutoffs for diagnosis and were as sensitive to measuring change associated with treatment as were the SIAS and SPS. Together, the SIAS-6 and the SPS-6 appear to be an efficient method of measuring symptoms of social phobia and provide a brief screening tool.
Somes Grant W
Full Text Available Abstract Background Direct-to-consumer (DTC marketing of pharmaceuticals is controversial, yet effective. Little is known relating patterns of medication use to patient responsiveness to DTC. Methods We conducted a secondary analysis of data collected in national telephone survey on knowledge of and attitudes toward DTC advertisements. The survey of 1081 U.S. adults (response rate = 65% was conducted by the Food and Drug Administration (FDA. Responsiveness to DTC was defined as an affirmative response to the item: "Has an advertisement for a prescription drug ever caused you to ask a doctor about a medical condition or illness of your own that you had not talked to a doctor about before?" Patients reported number of prescription and over-the-counter (OTC medicines taken as well as demographic and personal health information. Results Of 771 respondents who met study criteria, 195 (25% were responsive to DTC. Only 7% respondents taking no prescription were responsive, whereas 45% of respondents taking 5 or more prescription medications were responsive. This trend remained significant (p trend .0009 even when controlling for age, gender, race, educational attainment, income, self-reported health status, and whether respondents "liked" DTC advertising. There was no relationship between the number of OTC medications taken and the propensity to discuss health-related problems in response to DTC advertisements (p = .4. Conclusion There is a strong cross-sectional relationship between the number of prescription, but not OTC, drugs used and responsiveness to DTC advertising. Although this relationship could be explained by physician compliance with patient requests for medications, it is also plausible that DTC advertisements have a particular appeal to patients prone to taking multiple medications. Outpatients motivated to discuss medical conditions based on their exposure to DTC advertising may require a careful medication history to evaluate for
Dieringer, Nicholas J; Kukkamma, Lisa; Somes, Grant W; Shorr, Ronald I
ABSTRACT: BACKGROUND: Direct-to-consumer (DTC) marketing of pharmaceuticals is controversial, yet effective. Little is known relating patterns of medication use to patient responsiveness to DTC. METHODS: We conducted a secondary analysis of data collected in national telephone survey on knowledge of and attitudes toward DTC advertisements. The survey of 1081 U.S. adults (response rate = 65%) was conducted by the Food and Drug Administration (FDA). Responsiveness to DTC was defined as an affirmative response to the item: "Has an advertisement for a prescription drug ever caused you to ask a doctor about a medical condition or illness of your own that you had not talked to a doctor about before?" Patients reported number of prescription and over-the-counter (OTC) medicines taken as well as demographic and personal health information. RESULTS: Of 771 respondents who met study criteria, 195 (25%) were responsive to DTC. Only 7% respondents taking no prescription were responsive, whereas 45% of respondents taking 5 or more prescription medications were responsive. This trend remained significant (p trend .0009) even when controlling for age, gender, race, educational attainment, income, self-reported health status, and whether respondents "liked" DTC advertising. There was no relationship between the number of OTC medications taken and the propensity to discuss health-related problems in response to DTC advertisements (p = .4). CONCLUSION: There is a strong cross-sectional relationship between the number of prescription, but not OTC, drugs used and responsiveness to DTC advertising. Although this relationship could be explained by physician compliance with patient requests for medications, it is also plausible that DTC advertisements have a particular appeal to patients prone to taking multiple medications. Outpatients motivated to discuss medical conditions based on their exposure to DTC advertising may require a careful medication history to evaluate for therapeutic
Siskind, Theresa G.; Anderson, Lorin W.
The study was designed to examine the similarity of response options generated by different item writers using a systematic approach to item writing. The similarity of response options to student responses for the same item stems presented in an open-ended format was also examined. A non-systematic (subject matter expertise) approach and a…
Leonidas A Zampetakis
Full Text Available In the present research we used item response theory (IRT to examine whether effective predictions (anticipated affect conforms to a typical (i.e., what people usually do or a maximal behavior process (i.e., what people can do. The former, correspond to non-monotonic ideal point IRT models whereas the latter correspond to monotonic dominance IRT models. A convenience, cross-sectional student sample (N=1624 was used. Participants were asked to report on anticipated positive and negative affect around a hypothetical event (emotions surrounding the start of a new business. We carried out analysis comparing Graded Response Model (GRM, a dominance IRT model, against Generalized Graded Unfolding Model (GGUM, an unfolding IRT model. We found that the GRM provided a better fit to the data. Findings suggest that the self-report responses to anticipated affect conform to dominance response process (i.e. maximal behavior. The paper also discusses implications for a growing literature on anticipated affect.