WorldWideScience

Sample records for survey item responses

  1. Harmonizing Measures of Cognitive Performance Across International Surveys of Aging Using Item Response Theory.

    Science.gov (United States)

    Chan, Kitty S; Gross, Alden L; Pezzin, Liliana E; Brandt, Jason; Kasper, Judith D

    2015-12-01

    To harmonize measures of cognitive performance using item response theory (IRT) across two international aging studies. Data for persons ≥65 years from the Health and Retirement Study (HRS, N = 9,471) and the English Longitudinal Study of Aging (ELSA, N = 5,444). Cognitive performance measures varied (HRS fielded 25, ELSA 13); 9 were in common. Measurement precision was examined for IRT scores based on (a) common items, (b) common items adjusted for differential item functioning (DIF), and (c) DIF-adjusted all items. Three common items (day of date, immediate word recall, and delayed word recall) demonstrated DIF by survey. Adding survey-specific items improved precision but mainly for HRS respondents at lower cognitive levels. IRT offers a feasible strategy for harmonizing cognitive performance measures across other surveys and for other multi-item constructs of interest in studies of aging. Practical implications depend on sample distribution and the difficulty mix of in-common and survey-specific items. © The Author(s) 2015.

  2. A randomised trial and economic evaluation of the effect of response mode on response rate, response bias, and item non-response in a survey of doctors.

    Science.gov (United States)

    Scott, Anthony; Jeon, Sung-Hee; Joyce, Catherine M; Humphreys, John S; Kalb, Guyonne; Witt, Julia; Leahy, Anne

    2011-09-05

    Surveys of doctors are an important data collection method in health services research. Ways to improve response rates, minimise survey response bias and item non-response, within a given budget, have not previously been addressed in the same study. The aim of this paper is to compare the effects and costs of three different modes of survey administration in a national survey of doctors. A stratified random sample of 4.9% (2,702/54,160) of doctors undertaking clinical practice was drawn from a national directory of all doctors in Australia. Stratification was by four doctor types: general practitioners, specialists, specialists-in-training, and hospital non-specialists, and by six rural/remote categories. A three-arm parallel trial design with equal randomisation across arms was used. Doctors were randomly allocated to: online questionnaire (902); simultaneous mixed mode (a paper questionnaire and login details sent together) (900); or, sequential mixed mode (online followed by a paper questionnaire with the reminder) (900). Analysis was by intention to treat, as within each primary mode, doctors could choose either paper or online. Primary outcome measures were response rate, survey response bias, item non-response, and cost. The online mode had a response rate 12.95%, followed by the simultaneous mixed mode with 19.7%, and the sequential mixed mode with 20.7%. After adjusting for observed differences between the groups, the online mode had a 7 percentage point lower response rate compared to the simultaneous mixed mode, and a 7.7 percentage point lower response rate compared to sequential mixed mode. The difference in response rate between the sequential and simultaneous modes was not statistically significant. Both mixed modes showed evidence of response bias, whilst the characteristics of online respondents were similar to the population. However, the online mode had a higher rate of item non-response compared to both mixed modes. The total cost of the online

  3. Factors affecting study efficiency and item non-response in health surveys in developing countries: the Jamaica national healthy lifestyle survey

    Directory of Open Access Journals (Sweden)

    Bennett Franklyn

    2007-02-01

    Full Text Available Abstract Background Health surveys provide important information on the burden and secular trends of risk factors and disease. Several factors including survey and item non-response can affect data quality. There are few reports on efficiency, validity and the impact of item non-response, from developing countries. This report examines factors associated with item non-response and study efficiency in a national health survey in a developing Caribbean island. Methods A national sample of participants aged 15–74 years was selected in a multi-stage sampling design accounting for 4 health regions and 14 parishes using enumeration districts as primary sampling units. Means and proportions of the variables of interest were compared between various categories. Non-response was defined as failure to provide an analyzable response. Linear and logistic regression models accounting for sample design and post-stratification weighting were used to identify independent correlates of recruitment efficiency and item non-response. Results We recruited 2012 15–74 year-olds (66.2% females at a response rate of 87.6% with significant variation between regions (80.9% to 97.6%; p Conclusion Informative health surveys are possible in developing countries. While survey response rates may be satisfactory, item non-response was high in respect of income and sexual practice. In contrast to developed countries, non-response to questions on income is higher and has different correlates. These findings can inform future surveys.

  4. A Mixed Effects Randomized Item Response Model

    Science.gov (United States)

    Fox, J.-P.; Wyrick, Cheryl

    2008-01-01

    The randomized response technique ensures that individual item responses, denoted as true item responses, are randomized before observing them and so-called randomized item responses are observed. A relationship is specified between randomized item response data and true item response data. True item response data are modeled with a (non)linear…

  5. Selected response test items.

    Science.gov (United States)

    Tomey, A M

    1999-01-01

    Classroom assessment is complex and challenging. Teachers need to consider the cognitive, affective, and psychomotor levels for achievement of their educational objectives. This series of six articles discusses how to develop testing blue-prints; selected-response tests, including multiple-choice, true-false, matching, or other objective tests; completion or essay testing; problem solving/critical thinking activities; performance assessment; and computer-based testing.

  6. Modelling sequentially scored item responses

    NARCIS (Netherlands)

    Akkermans, W.

    2000-01-01

    The sequential model can be used to describe the variable resulting from a sequential scoring process. In this paper two more item response models are investigated with respect to their suitability for sequential scoring: the partial credit model and the graded response model. The investigation is c

  7. Generalizability theory and item response theory

    NARCIS (Netherlands)

    Glas, Cornelis A.W.; Eggen, T.J.H.M.; Veldkamp, B.P.

    2012-01-01

    Item response theory is usually applied to items with a selected-response format, such as multiple choice items, whereas generalizability theory is usually applied to constructed-response tasks assessed by raters. However, in many situations, raters may use rating scales consisting of items with a

  8. Generalizability theory and item response theory

    NARCIS (Netherlands)

    Glas, C.A.W.; Eggen, T.J.H.M.; Veldkamp, B.P.

    2012-01-01

    Item response theory is usually applied to items with a selected-response format, such as multiple choice items, whereas generalizability theory is usually applied to constructed-response tasks assessed by raters. However, in many situations, raters may use rating scales consisting of items with a s

  9. Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions.

    Science.gov (United States)

    Haberman, Shelby J; Sinharay, Sandip; Chon, Kyong Hee

    2013-07-01

    Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.

  10. Assessing the Equivalence of Paper, Mobile Phone, and Tablet Survey Responses at a Community Mental Health Center Using Equivalent Halves of a 'Gold-Standard' Depression Item Bank.

    Science.gov (United States)

    Brodey, Benjamin B; Gonzalez, Nicole L; Elkin, Kathryn Ann; Sasiela, W Jordan; Brodey, Inger S

    2017-09-06

    The computerized administration of self-report psychiatric diagnostic and outcomes assessments has risen in popularity. If results are similar enough across different administration modalities, then new administration technologies can be used interchangeably and the choice of technology can be based on other factors, such as convenience in the study design. An assessment based on item response theory (IRT), such as the Patient-Reported Outcomes Measurement Information System (PROMIS) depression item bank, offers new possibilities for assessing the effect of technology choice upon results. To create equivalent halves of the PROMIS depression item bank and to use these halves to compare survey responses and user satisfaction among administration modalities-paper, mobile phone, or tablet-with a community mental health care population. The 28 PROMIS depression items were divided into 2 halves based on content and simulations with an established PROMIS response data set. A total of 129 participants were recruited from an outpatient public sector mental health clinic based in Memphis. All participants took both nonoverlapping halves of the PROMIS IRT-based depression items (Part A and Part B): once using paper and pencil, and once using either a mobile phone or tablet. An 8-cell randomization was done on technology used, order of technologies used, and order of PROMIS Parts A and B. Both Parts A and B were administered as fixed-length assessments and both were scored using published PROMIS IRT parameters and algorithms. All 129 participants received either Part A or B via paper assessment. Participants were also administered the opposite assessment, 63 using a mobile phone and 66 using a tablet. There was no significant difference in item response scores for Part A versus B. All 3 of the technologies yielded essentially identical assessment results and equivalent satisfaction levels. Our findings show that the PROMIS depression assessment can be divided into 2 equivalent

  11. Bayesian item fit analysis for unidimensional item response theory models.

    Science.gov (United States)

    Sinharay, Sandip

    2006-11-01

    Assessing item fit for unidimensional item response theory models for dichotomous items has always been an issue of enormous interest, but there exists no unanimously agreed item fit diagnostic for these models, and hence there is room for further investigation of the area. This paper employs the posterior predictive model-checking method, a popular Bayesian model-checking tool, to examine item fit for the above-mentioned models. An item fit plot, comparing the observed and predicted proportion-correct scores of examinees with different raw scores, is suggested. This paper also suggests how to obtain posterior predictive p-values (which are natural Bayesian p-values) for the item fit statistics of Orlando and Thissen that summarize numerically the information in the above-mentioned item fit plots. A number of simulation studies and a real data application demonstrate the effectiveness of the suggested item fit diagnostics. The suggested techniques seem to have adequate power and reasonable Type I error rate, and psychometricians will find them promising.

  12. Application of multidimensional item response theory models to longitudinal data

    NARCIS (Netherlands)

    Marvelde, te Janneke M.; Glas, Cees A.W.; Van Landeghem, Georges; Van Damme, Jan

    2006-01-01

    The application of multidimensional item response theory (IRT) models to longitudinal educational surveys where students are repeatedly measured is discussed and exemplified. A marginal maximum likelihood (MML) method to estimate the parameters of a multidimensional generalized partial credit model

  13. Identifying the ‘red flags’ for unhealthy weight control among adolescents: Findings from an item response theory analysis of a national survey

    Directory of Open Access Journals (Sweden)

    Utter Jennifer

    2012-08-01

    Full Text Available Abstract Background Weight control behaviors are common among young people and are associated with poor health outcomes. Yet clinicians rarely ask young people about their weight control; this may be due to uncertainty about which questions to ask, specifically around whether certain weight loss strategies are healthier or unhealthy or about what weight loss behaviors are more likely to lead to adverse outcomes. Thus, the aims of the current study are: to confirm, using item response theory analysis, that the underlying latent constructs of healthy and unhealthy weight control exist; to determine the ‘red flag’ weight loss behaviors that may discriminate unhealthy from healthy weight loss; to determine the relationships between healthy and unhealthy weight loss and mental health; and to examine how weight control may vary among demographic groups. Methods Data were collected as part of a national health and wellbeing survey of secondary school students in New Zealand (n = 9,107 in 2007. Item response theory analyses were conducted to determine the underlying constructs of weight control behaviors and the behaviors that discriminate unhealthy from healthy weight control. Results The current study confirms that there are two underlying constructs of weight loss behaviors which can be described as healthy and unhealthy weight control. Unhealthy weight control was positively correlated with depressive mood. Fasting and skipping meals for weight loss had the lowest item thresholds on the unhealthy weight control continuum, indicating that they act as ‘red flags’ and warrant further discussion in routine clinical assessments. Conclusions Routine assessments of weight control strategies by clinicians are warranted, particularly for screening for meal skipping and fasting for weight loss as these behaviors appear to ‘flag’ behaviors that are associated with poor mental wellbeing.

  14. Measuring response styles in Likert items.

    Science.gov (United States)

    Böckenholt, Ulf

    2017-03-01

    The recently proposed class of item response tree models provides a flexible framework for modeling multiple response processes. This feature is particularly attractive for understanding how response styles may affect answers to attitudinal questions. Facilitating the disassociation of response styles and attitudinal traits, item response tree models can provide powerful process tests of how different response formats may affect the measurement of substantive traits. In an empirical study, 3 response formats were used to measure the 2-dimensional Personal Need for Structure traits. Different item response tree models are proposed to capture the response styles for each of the response formats. These models show that the response formats give rise to similar trait measures but different response-style effects. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  15. Item response theory - A first approach

    Science.gov (United States)

    Nunes, Sandra; Oliveira, Teresa; Oliveira, Amílcar

    2017-07-01

    The Item Response Theory (IRT) has become one of the most popular scoring frameworks for measurement data, frequently used in computerized adaptive testing, cognitively diagnostic assessment and test equating. According to Andrade et al. (2000), IRT can be defined as a set of mathematical models (Item Response Models - IRM) constructed to represent the probability of an individual giving the right answer to an item of a particular test. The number of Item Responsible Models available to measurement analysis has increased considerably in the last fifteen years due to increasing computer power and due to a demand for accuracy and more meaningful inferences grounded in complex data. The developments in modeling with Item Response Theory were related with developments in estimation theory, most remarkably Bayesian estimation with Markov chain Monte Carlo algorithms (Patz & Junker, 1999). The popularity of Item Response Theory has also implied numerous overviews in books and journals, and many connections between IRT and other statistical estimation procedures, such as factor analysis and structural equation modeling, have been made repeatedly (Van der Lindem & Hambleton, 1997). As stated before the Item Response Theory covers a variety of measurement models, ranging from basic one-dimensional models for dichotomously and polytomously scored items and their multidimensional analogues to models that incorporate information about cognitive sub-processes which influence the overall item response process. The aim of this work is to introduce the main concepts associated with one-dimensional models of Item Response Theory, to specify the logistic models with one, two and three parameters, to discuss some properties of these models and to present the main estimation procedures.

  16. An item response theory analysis of DSM-IV diagnostic criteria for personality disorders: findings from the national epidemiologic survey on alcohol and related conditions.

    Science.gov (United States)

    Harford, Thomas C; Chen, Chiung M; Saha, Tulshi D; Smith, Sharon M; Hasin, Deborah S; Grant, Bridget F

    2013-01-01

    The purpose of this study was to evaluate the psychometric properties of DSM-IV symptom criteria for assessing personality disorders (PDs) in a national population and to compare variations in proposed symptom coding for social and/or occupational dysfunction. Data were obtained from a total sample of 34,653 respondents from Waves 1 and 2 of the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC). For each personality disorder, confirmatory factor analysis (CFA) established a 1-factor latent factor structure for the respective symptom criteria. A 2-parameter item response theory (IRT) model was applied to the symptom criteria for each PD to assess the probabilities of symptom item endorsements across different values of the underlying trait (latent factor). Findings were compared with a separate IRT model using an alternative coding of symptom criteria that requires distress/impairment to be related to each criterion. The CFAs yielded a good fit for a single underlying latent dimension for each PD. Findings from the IRT indicated that DSM-IV PD symptom criteria are clustered in the moderate to severe range of the underlying latent dimension for each PD and are peaked, indicating high measurement precision only within a narrow range of the underlying trait and lower measurement precision at lower and higher levels of severity. Compared with the NESARC symptom coding, the IRT results for the alternative symptom coding are shifted toward the more severe range of the latent trait but generally have lower measurement precision for each PD. The IRT findings provide support for a reliable assessment of each PD for both NESARC and alternative coding for distress/impairment. The use of symptom dysfunction for each criterion, however, raises a number of issues and implications for the DSM-5 revision currently proposed for Axis II disorders (American Psychiatric Association, 2010).

  17. Ramsay-Curve Item Response Theory for the Three-Parameter Logistic Item Response Model

    Science.gov (United States)

    Woods, Carol M.

    2008-01-01

    In Ramsay-curve item response theory (RC-IRT), the latent variable distribution is estimated simultaneously with the item parameters of a unidimensional item response model using marginal maximum likelihood estimation. This study evaluates RC-IRT for the three-parameter logistic (3PL) model with comparisons to the normal model and to the empirical…

  18. Measuring student learning with item response theory

    Directory of Open Access Journals (Sweden)

    Young-Jin Lee

    2008-01-01

    Full Text Available We investigate short-term learning from hints and feedback in a Web-based physics tutoring system. Both the skill of students and the difficulty and discrimination of items were determined by applying item response theory (IRT to the first answers of students who are working on for-credit homework items in an introductory Newtonian physics course. We show that after tutoring a shifted logistic item response function with lower discrimination fits the students’ second responses to an item previously answered incorrectly. Student skill decreased by 1.0 standard deviation when students used no tutoring between their (incorrect first and second attempts, which we attribute to “item-wrong bias.” On average, using hints or feedback increased students’ skill by 0.8 standard deviation. A skill increase of 1.9 standard deviation was observed when hints were requested after viewing, but prior to attempting to answer, a particular item. The skill changes measured in this way will enable the use of IRT to assess students based on their second attempt in a tutoring environment.

  19. Item response theory for measurement validity.

    Science.gov (United States)

    Yang, Frances M; Kao, Solon T

    2014-06-01

    Item response theory (IRT) is an important method of assessing the validity of measurement scales that is underutilized in the field of psychiatry. IRT describes the relationship between a latent trait (e.g., the construct that the scale proposes to assess), the properties of the items in the scale, and respondents' answers to the individual items. This paper introduces the basic premise, assumptions, and methods of IRT. To help explain these concepts we generate a hypothetical scale using three items from a modified, binary (yes/no) response version of the Center for Epidemiological Studies-Depression scale that was administered to 19,399 respondents. We first conducted a factor analysis to confirm the unidimensionality of the three items and then proceeded with Mplus software to construct the 2-Parameter Logic (2-PL) IRT model of the data, a method which allows for estimates of both item discrimination and item difficulty. The utility of this information both for clinical purposes and for scale construction purposes is discussed.

  20. A New Extension of the Binomial Error Model for Responses to Items of Varying Difficulty in Educational Testing and Attitude Surveys.

    Directory of Open Access Journals (Sweden)

    James A Wiley

    Full Text Available We put forward a new item response model which is an extension of the binomial error model first introduced by Keats and Lord. Like the binomial error model, the basic latent variable can be interpreted as a probability of responding in a certain way to an arbitrarily specified item. For a set of dichotomous items, this model gives predictions that are similar to other single parameter IRT models (such as the Rasch model but has certain advantages in more complex cases. The first is that in specifying a flexible two-parameter Beta distribution for the latent variable, it is easy to formulate models for randomized experiments in which there is no reason to believe that either the latent variable or its distribution vary over randomly composed experimental groups. Second, the elementary response function is such that extensions to more complex cases (e.g., polychotomous responses, unfolding scales are straightforward. Third, the probability metric of the latent trait allows tractable extensions to cover a wide variety of stochastic response processes.

  1. A New Extension of the Binomial Error Model for Responses to Items of Varying Difficulty in Educational Testing and Attitude Surveys.

    Science.gov (United States)

    Wiley, James A; Martin, John Levi; Herschkorn, Stephen J; Bond, Jason

    2015-01-01

    We put forward a new item response model which is an extension of the binomial error model first introduced by Keats and Lord. Like the binomial error model, the basic latent variable can be interpreted as a probability of responding in a certain way to an arbitrarily specified item. For a set of dichotomous items, this model gives predictions that are similar to other single parameter IRT models (such as the Rasch model) but has certain advantages in more complex cases. The first is that in specifying a flexible two-parameter Beta distribution for the latent variable, it is easy to formulate models for randomized experiments in which there is no reason to believe that either the latent variable or its distribution vary over randomly composed experimental groups. Second, the elementary response function is such that extensions to more complex cases (e.g., polychotomous responses, unfolding scales) are straightforward. Third, the probability metric of the latent trait allows tractable extensions to cover a wide variety of stochastic response processes.

  2. Analysis of Individual "Test Of Astronomy STandards" (TOAST) Item Responses

    Science.gov (United States)

    Slater, Stephanie J.; Schleigh, Sharon Price; Stork, Debra J.

    2015-01-01

    The development of valid and reliable strategies to efficiently determine the knowledge landscape of introductory astronomy college students is an effort of great interest to the astronomy education community. This study examines individual item response rates from a widely used conceptual understanding survey, the Test Of Astronomy Standards…

  3. Extending Item Response Theory to Online Homework

    CERN Document Server

    Kortemeyer, Gerd

    2014-01-01

    Item Response Theory becomes an increasingly important tool when analyzing ``Big Data'' gathered from online educational venues. However, the mechanism was originally developed in traditional exam settings, and several of its assumptions are infringed upon when deployed in the online realm. For a large enrollment physics course for scientists and engineers, the study compares outcomes from IRT analyses of exam and homework data, and then proceeds to investigate the effects of each confounding factor introduced in the online realm. It is found that IRT yields the correct trends for learner ability and meaningful item parameters, yet overall agreement with exam data is moderate. It is also found that learner ability and item discrimination is over wide ranges robust with respect to model assumptions and introduced noise, less so than item difficulty.

  4. A Bifactor Multidimensional Item Response Theory Model for Differential Item Functioning Analysis on Testlet-Based Items

    Science.gov (United States)

    Fukuhara, Hirotaka; Kamata, Akihito

    2011-01-01

    A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into…

  5. A continuous-scale measure of child development for population-based epidemiological surveys: a preliminary study using Item Response Theory for the Denver Test.

    Science.gov (United States)

    Drachler, Maria de Lourdes; Marshall, Tom; de Carvalho Leite, José Carlos

    2007-03-01

    A method for translating research data from the Denver Test into individual scores of developmental status measured in a continuous scale is presented. It was devised using the Denver Developmental Screening Test (DDST) but can be used for Denver II. The DDST was applied in a community-based survey of 3389 under-5-year-olds in Porto Alegre, Brazil. The items of success were standardised by logistic regression on log chronological age. Each child's ability age was then estimated by maximum likelihood as the age in this reference population corresponding to the child's success and failures in the test. The score of developmental status is the natural logarithm of this ability age divided by chronological age and thus measures the delay or advance in the child's ability age compared with chronological age. This method estimates development status using both difficulty and discriminating power of each item in the reference population, an advantage over scores based on total number of items correctly performed or failed, which depend on difficulty only. The score corresponds with maternal opinion of child developmental status and with the 3-category scale of the DDST. It shows good construct validity, indicated by symmetrical and homogeneous variability from 3 months upwards, and reasonable results in describing gender differences in development by age, the mean score increasing with socio-economic conditions and diminishing among low-birthweight children. If a standardised measure of development status (z-scores) is required, this can be obtained by dividing the score by its standard deviation. Concurrent and discriminant validity of the score must be examined in further studies.

  6. Stability of Differential Item Functioning over a Single Population in Survey Data

    Science.gov (United States)

    Dodeen, Hamzeh

    2004-01-01

    This study investigates the stability of differential item functioning (DIF) in survey data. Surveys are conducted periodically, and their results are often reported by aggregating responses. Estimating the stability of DIF across subsets of a survey population can be an important indicator in determining the likelihood of DIF stability over…

  7. Teoria da Resposta ao Item Teoria de la respuesta al item Item response theory

    Directory of Open Access Journals (Sweden)

    Eutalia Aparecida Candido de Araujo

    2009-12-01

    Full Text Available A preocupação com medidas de traços psicológicos é antiga, sendo que muitos estudos e propostas de métodos foram desenvolvidos no sentido de alcançar este objetivo. Entre os trabalhos propostos, destaca-se a Teoria da Resposta ao Item (TRI que, a princípio, veio completar limitações da Teoria Clássica de Medidas, empregada em larga escala até hoje na medida de traços psicológicos. O ponto principal da TRI é que ela leva em consideração o item particularmente, sem relevar os escores totais; portanto, as conclusões não dependem apenas do teste ou questionário, mas de cada item que o compõe. Este artigo propõe-se a apresentar esta Teoria que revolucionou a teoria de medidas.La preocupación con las medidas de los rasgos psicológicos es antigua y muchos estudios y propuestas de métodos fueron desarrollados para lograr este objetivo. Entre estas propuestas de trabajo se incluye la Teoría de la Respuesta al Ítem (TRI que, en principio, vino a completar las limitaciones de la Teoría Clásica de los Tests, ampliamente utilizada hasta hoy en la medida de los rasgos psicológicos. El punto principal de la TRI es que se tiene en cuenta el punto concreto, sin relevar las puntuaciones totales; por lo tanto, los resultados no sólo dependen de la prueba o cuestionario, sino que de cada ítem que lo compone. En este artículo se propone presentar la Teoría que revolucionó la teoría de medidas.The concern with measures of psychological traits is old and many studies and proposals of methods were developed to achieve this goal. Among these proposed methods highlights the Item Response Theory (IRT that, in principle, came to complete limitations of the Classical Test Theory, which is widely used until nowadays in the measurement of psychological traits. The main point of IRT is that it takes into account the item in particular, not relieving the total scores; therefore, the findings do not only depend on the test or questionnaire

  8. Examining item difficulty and response time on perceptual ability test items.

    Science.gov (United States)

    Yang, Chien-Lin; O'Neill, Thomas R; Kramer, Gene A

    2002-01-01

    This study examined item calibration stability in relation to response time and the levels of item difficulty between different response time groups on a sample of 389 examinees responding to six different subtest items of the Perceptual Ability Test (PAT). The results indicated that no Differential Item Functioning (DIF) was found and a significant correlation coefficient of item difficulty was formed between slow and fast responders. Three distinct levels of difficulty emerged among the six subtests across groups. Slow responders spent significantly more time than fast responders on the four most difficult subtests. A positive significant relationship was found between item difficulty and response time across groups on the overall perceptual ability test items. Overall, this study found that: 1) the same underlying construct is being measured across groups, 2) the PAT scores were equally useful across groups, 3) different sources of item difficulty may exist among the six subtests, and 4) more difficult test items may require more time to answer.

  9. AN ITEM RESPONSE MODEL WITH SINGLE PEAKED ITEM CHARACTERISTIC CURVES - THE PARELLA MODEL

    NARCIS (Netherlands)

    HOIJTINK, H; MOLENAAR, [No Value

    In this paper an item response model (the PARELLA model) designed specifically for the measurement of attitudes and preferences will be introduced. In contrast with the item response models currently used (e.g. the Rasch model and, the two and three parameter logistic model) the item characteristic

  10. Using item response theory to measure extreme response style in marketing research: a global investigation

    NARCIS (Netherlands)

    Jong, de Martijn G.; Steenkamp, Jan-Benedict E.M.; Fox, Jean-Paul; Baumgartner, Hans

    2008-01-01

    Extreme response style (ERS) is an important threat to the validity of survey-based marketing research. In this article, the authors present a new item response theory–based model for measuring ERS. This model contributes to the ERS literature in two ways. First, the method improves on existing proc

  11. Evaluation of Northwest University, Kano Post-UTME Test Items Using Item Response Theory

    Science.gov (United States)

    Bichi, Ado Abdu; Hafiz, Hadiza; Bello, Samira Abdullahi

    2016-01-01

    High-stakes testing is used for the purposes of providing results that have important consequences. Validity is the cornerstone upon which all measurement systems are built. This study applied the Item Response Theory principles to analyse Northwest University Kano Post-UTME Economics test items. The developed fifty (50) economics test items was…

  12. Stochastic Approximation Methods for Latent Regression Item Response Models

    Science.gov (United States)

    von Davier, Matthias; Sinharay, Sandip

    2010-01-01

    This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…

  13. Item Response Methods for Educational Assessment.

    Science.gov (United States)

    Mislevy, Robert J.; Rieser, Mark R.

    Multiple matrix sampling (MMS) theory indicates how data may be gathered to most efficiently convey information about levels of attainment in a population, but standard analyses of these data require random sampling of items from a fixed pool of items. This assumption proscribes the retirement of flawed or obsolete items from the pool as well as…

  14. A generalized item response tree model for psychological assessments.

    Science.gov (United States)

    Jeon, Minjeong; De Boeck, Paul

    2016-09-01

    A new item response theory (IRT) model with a tree structure has been introduced for modeling item response processes with a tree structure. In this paper, we present a generalized item response tree model with a flexible parametric form, dimensionality, and choice of covariates. The utilities of the model are demonstrated with two applications in psychological assessments for investigating Likert scale item responses and for modeling omitted item responses. The proposed model is estimated with the freely available R package flirt (Jeon et al., 2014b).

  15. Evaluating Item Discrimination Power of WHOQOL-BREF from an Item Response Model Perspectives

    Science.gov (United States)

    Lin, Ting Hsiang; Yao, Grace

    2009-01-01

    Quality of life (QOL) has become an important component of health. By using the methodology of psychometric theory, we examine the item properties of the WHOQOL-BRIEF. Samejima's graded response model with natural metrics of the logistic response function was fitted. The results showed items with negative natures were less discriminating. Items…

  16. Higher-Order Item Response Models for Hierarchical Latent Traits

    Science.gov (United States)

    Huang, Hung-Yu; Wang, Wen-Chung; Chen, Po-Hsi; Su, Chi-Ming

    2013-01-01

    Many latent traits in the human sciences have a hierarchical structure. This study aimed to develop a new class of higher order item response theory models for hierarchical latent traits that are flexible in accommodating both dichotomous and polytomous items, to estimate both item and person parameters jointly, to allow users to specify…

  17. Application of Unidimensional Item Response Models to Tests with Items Sensitive to Secondary Dimensions

    Science.gov (United States)

    Zhang, Bo

    2008-01-01

    In this research, the author addresses whether the application of unidimensional item response models provides valid interpretation of test results when administering items sensitive to multiple latent dimensions. Overall, the present study found that unidimensional models are quite robust to the violation of the unidimensionality assumption due…

  18. Using response times for item selection in adaptive testing

    NARCIS (Netherlands)

    Linden, van der Wim J.

    2008-01-01

    Response times on items can be used to improve item selection in adaptive testing provided that a probabilistic model for their distribution is available. In this research, the author used a hierarchical modeling framework with separate first-level models for the responses and response times and a s

  19. A lognormal model for response times on test items

    NARCIS (Netherlands)

    van der Linden, Willem J.

    2006-01-01

    A lognormal model for the response times of a person on a set of test items is investigated. The model has a parameter structure analogous to the two-parameter logistic response models in item response theory, with a parameter for the speed of each person as well as parameters for the time intensity

  20. The basics of item response theory using R

    CERN Document Server

    Baker, Frank B

    2017-01-01

    This graduate-level textbook is a tutorial for item response theory that covers both the basics of item response theory and the use of R for preparing graphical presentation in writings about the theory. Item response theory has become one of the most powerful tools used in test construction, yet one of the barriers to learning and applying it is the considerable amount of sophisticated computational effort required to illustrate even the simplest concepts. This text provides the reader access to the basic concepts of item response theory freed of the tedious underlying calculations. It is intended for those who possess limited knowledge of educational measurement and psychometrics. Rather than presenting the full scope of item response theory, this textbook is concise and practical and presents basic concepts without becoming enmeshed in underlying mathematical and computational complexities. Clearly written text and succinct R code allow anyone familiar with statistical concepts to explore and apply item re...

  1. The use of an item response theory-based disability item bank across diseases: accounting for differential item functioning.

    Science.gov (United States)

    Weisscher, Nadine; Glas, Cees A; Vermeulen, Marinus; De Haan, Rob J

    2010-05-01

    There is not a single universally accepted activity of daily living (ADL) instrument available to compare disability assessments across different patient groups. We developed a generic item bank of ADL items using item response theory, the Academic Medical Center Linear Disability Scale (ALDS). When comparing outcomes of the ALDS between patients groups, item characteristics of the ALDS should be comparable across groups. The aim of the study was to assess the differential item functioning (DIF) in a group of patients with various disorders to investigate the comparability across these groups. Cross-sectional, multicenter study including 1,283 in- and outpatients with a variety of disorders and disability levels. The sample was divided in two groups: (1) mainly neurological patients (n=497; vascular medicine, Parkinson's disease and neuromuscular disorders) and (2) patients from internal medicine (n=786; pulmonary diseases, chronic pain, rheumatoid arthritis, and geriatric patients). Eighteen of 72 ALDS items showed statistically significant DIF (P<0.01). However, the DIF could effectively be modeled by the introduction of disease-specific parameters. In the subgroups studied, DIF could be modeled in such a way that the ensemble of the items comprised a scale applicable in both groups.

  2. Item Response Theory Using Hierarchical Generalized Linear Models

    Directory of Open Access Journals (Sweden)

    Hamdollah Ravand

    2015-03-01

    Full Text Available Multilevel models (MLMs are flexible in that they can be employed to obtain item and person parameters, test for differential item functioning (DIF and capture both local item and person dependence. Papers on the MLM analysis of item response data have focused mostly on theoretical issues where applications have been add-ons to simulation studies with a methodological focus. Although the methodological direction was necessary as a first step to show how MLMs can be utilized and extended to model item response data, the emphasis needs to be shifted towards providing evidence on how applications of MLMs in educational testing can provide the benefits that have been promised. The present study uses foreign language reading comprehension data to illustrate application of hierarchical generalized models to estimate person and item parameters, differential item functioning (DIF, and local person dependence in a three-level model.

  3. Analysis of differential item functioning in the depression item bank from the Patient Reported Outcome Measurement Information System (PROMIS: An item response theory approach

    Directory of Open Access Journals (Sweden)

    JOSEPH P. EIMICKE

    2009-06-01

    Full Text Available The aims of this paper are to present findings related to differential item functioning (DIF in the Patient Reported Outcome Measurement Information System (PROMIS depression item bank, and to discuss potential threats to the validity of results from studies of DIF. The 32 depression items studied were modified from several widely used instruments. DIF analyses of gender, age and education were performed using a sample of 735 individuals recruited by a survey polling firm. DIF hypotheses were generated by asking content experts to indicate whether or not they expected DIF to be present, and the direction of the DIF with respect to the studied comparison groups. Primary analyses were conducted using the graded item response model (for polytomous, ordered response category data with likelihood ratio tests of DIF, accompanied by magnitude measures. Sensitivity analyses were performed using other item response models and approaches to DIF detection. Despite some caveats, the items that are recommended for exclusion or for separate calibration were "I felt like crying" and "I had trouble enjoying things that I used to enjoy." The item, "I felt I had no energy," was also flagged as evidencing DIF, and recommended for additional review. On the one hand, false DIF detection (Type 1 error was controlled to the extent possible by ensuring model fit and purification. On the other hand, power for DIF detection might have been compromised by several factors, including sparse data and small sample sizes. Nonetheless, practical and not just statistical significance should be considered. In this case the overall magnitude and impact of DIF was small for the groups studied, although impact was relatively large for some individuals.

  4. Analysis of differential item functioning in the depression item bank from the Patient Reported Outcome Measurement Information System (PROMIS): An item response theory approach

    Science.gov (United States)

    Teresi, Jeanne A.; Ocepek-Welikson, Katja; Kleinman, Marjorie; Eimicke, Joseph P.; Crane, Paul K.; Jones, Richard N.; Lai, Jin-shei; Choi, Seung W.; Hays, Ron D.; Reeve, Bryce B.; Reise, Steven P.; Pilkonis, Paul A.; Cella, David

    2009-01-01

    The aims of this paper are to present findings related to differential item functioning (DIF) in the Patient Reported Outcome Measurement Information System (PROMIS) depression item bank, and to discuss potential threats to the validity of results from studies of DIF. The 32 depression items studied were modified from several widely used instruments. DIF analyses of gender, age and education were performed using a sample of 735 individuals recruited by a survey polling firm. DIF hypotheses were generated by asking content experts to indicate whether or not they expected DIF to be present, and the direction of the DIF with respect to the studied comparison groups. Primary analyses were conducted using the graded item response model (for polytomous, ordered response category data) with likelihood ratio tests of DIF, accompanied by magnitude measures. Sensitivity analyses were performed using other item response models and approaches to DIF detection. Despite some caveats, the items that are recommended for exclusion or for separate calibration were “I felt like crying” and “I had trouble enjoying things that I used to enjoy.” The item, “I felt I had no energy,” was also flagged as evidencing DIF, and recommended for additional review. On the one hand, false DIF detection (Type 1 error) was controlled to the extent possible by ensuring model fit and purification. On the other hand, power for DIF detection might have been compromised by several factors, including sparse data and small sample sizes. Nonetheless, practical and not just statistical significance should be considered. In this case the overall magnitude and impact of DIF was small for the groups studied, although impact was relatively large for some individuals. PMID:20336180

  5. Survey Page Length and Progress Indicators: What Are Their Relationships to Item Nonresponse?

    Science.gov (United States)

    Bowman, Nicholas A.; Herzog, Serge; Sarraf, Shimon; Tukibayeva, Malika

    2014-01-01

    The popularity of online student surveys has been associated with greater item nonresponse. This chapter presents research aimed at exploring what factors might help minimize item nonresponse, such as altering online survey page length and using progress indicators.

  6. Mixture randomized item-response modeling: a smoking behavior validation study.

    Science.gov (United States)

    Fox, J-P; Avetisyan, M; van der Palen, J

    2013-11-30

    Misleading response behavior is expected in medical settings where incriminating behavior is negatively related to the recovery from a disease. In the present study, lung patients feel social and professional pressure concerning smoking and experience questions about smoking behavior as sensitive and tend to conceal embarrassing or threatening information. The randomized item-response survey method is expected to improve the accuracy of self-reports as individual item responses are masked and only randomized item responses are observed. We explored the validation of the randomized item-response technique in a unique experimental study. Therefore, we administered a new multi-item measure assessing smoking behavior by using a treatment-control design (randomized response (RR) or direct questioning). After the questionnaire, we administered a breath test by using a carbon monoxide (CO) monitor to determine the smoking status of the patient. We used the response data to measure the individual smoking behavior by using a mixture item-response model. It is shown that the detected smokers scored significantly higher in the RR condition compared with the directly questioned condition. We proposed a Bayesian latent variable framework to evaluate the diagnostic test accuracy of the questionnaire using the randomized-response technique, which is based on the posterior densities of the subject's smoking behavior scores together with the breath test measurements. For different diagnostic test thresholds, we obtained moderate posterior mean estimates of sensitivity and specificity by observing a limited number of discrete randomized item responses. Copyright © 2013 John Wiley & Sons, Ltd.

  7. Implementation of the forced answering option within online surveys: Do higher item response rates come at the expense of participation and answer quality?

    Directory of Open Access Journals (Sweden)

    Décieux Jean Philippe

    2015-01-01

    Full Text Available Online surveys have become a popular method for data gathering for many reasons, including low costs and the ability to collect data rapidly. However, online data collection is often conducted without adequate attention to implementation details. One example is the frequent use of the forced answering option, which forces the respondent to answer each question in order to proceed through the questionnaire. The avoidance of missing data is often the idea behind the use of the forced answering option. However, we suggest that the costs of a reactance effect in terms of quality reduction and unit nonresponse may be high because respondents typically have plausible reasons for not answering questions. The objective of the study reported in this paper was to test the influence of forced answering on dropout rates and data quality. The results show that requiring participants answer every question increases dropout rates and decreases quality of answers. Our findings suggest that the desire for a complete data set has to be balanced against the consequences of reduced data quality.

  8. Semiparametric Item Response Functions in the Context of Guessing

    Science.gov (United States)

    Falk, Carl F.; Cai, Li

    2016-01-01

    We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood-based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…

  9. A nonparametric approach to the analysis of dichotomous item responses

    NARCIS (Netherlands)

    Mokken, R.J.; Lewis, C.

    1982-01-01

    An item response theory is discussed which is based on purely ordinal assumptions about the probabilities that people respond positively to items. It is considered as a natural generalization of both Guttman scaling and classical test theory. A distinction is drawn between construction and evaluatio

  10. Characterizing Sources of Uncertainty in Item Response Theory Scale Scores

    Science.gov (United States)

    Yang, Ji Seung; Hansen, Mark; Cai, Li

    2012-01-01

    Traditional estimators of item response theory scale scores ignore uncertainty carried over from the item calibration process, which can lead to incorrect estimates of the standard errors of measurement (SEMs). Here, the authors review a variety of approaches that have been applied to this problem and compare them on the basis of their statistical…

  11. Inconsistent Student Responses in TIMSS Questionnaire Items on Mathematics Lessons

    Directory of Open Access Journals (Sweden)

    Selda Yıldırım

    2009-12-01

    Full Text Available This study investigated consistency among Turkish students’ responses to TIMSS 2007 questionnaire items on frequency of certain activities in mathematics classrooms. In Turkey, 4476 students from 143 schools participated in the study. Analyses have revealed the existence of inconsistencies in student responses as indicated by high proportion of within-class variance components. That is, students in same class specified fluctuating frequencies to certain classroom activities, showing that some factors had an affect on perception of individuals. Further analyses showed that students at different levels of mathematics achievement reported differently on frequency of classroom activities, and precise items were answered more consistently compared to items containing vague terms. Using factor scores instead of individual item responses contributed consistency of responses within classes but only to a small extent. Based on the findings, this study also provided implications for questionnaire design.

  12. Item response theory modeling with nonignorable missing data

    NARCIS (Netherlands)

    Pimentel, Jonald L.

    2005-01-01

    This thesis discusses methods to detect nonignorable missing data and methods to adjust for the bias caused by nonignorable missing data, both by introducing a model for the missing data indicator using item response theory (IRT) models.

  13. An item factor analysis and item response theory-based revision of the Everyday Discrimination Scale.

    Science.gov (United States)

    Stucky, Brian D; Gottfredson, Nisha C; Panter, A T; Daye, Charles E; Allen, Walter R; Wightman, Linda F

    2011-04-01

    The Everyday Discrimination Scale (EDS), a widely used measure of daily perceived discrimination, is purported to be unidimensional, to function well among African Americans, and to have adequate construct validity. Two separate studies and data sources were used to examine and cross-validate the psychometric properties of the EDS. In Study 1, an exploratory factor analysis was conducted on a sample of African American law students (N = 589), providing strong evidence of local dependence, or nuisance multidimensionality within the EDS. In Study 2, a separate nationally representative community sample (N = 3,527) was used to model the identified local dependence in an item factor analysis (i.e., bifactor model). Next, item response theory (IRT) calibrations were conducted to obtain item parameters. A five-item, revised-EDS was then tested for gender differential item functioning (in an IRT framework). Based on these analyses, a summed score to IRT-scaled score translation table is provided for the revised-EDS. Our results indicate that the revised-EDS is unidimensional, with minimal differential item functioning, and retains predictive validity consistent with the original scale.

  14. Assessing Subgroup Differences in Item Response Times.

    Science.gov (United States)

    Schnipke, Deborah L.; Pashley, Peter J.

    Differences in test performance on time-limited tests may be due in part to differential response-time rates between subgroups, rather than real differences in the knowledge, skills, or developed abilities of interest. With computer-administered tests, response times are available and may be used to address this issue. This study investigates…

  15. An NCME Instructional Module on Item-Fit Statistics for Item Response Theory Models

    Science.gov (United States)

    Ames, Allison J.; Penfield, Randall D.

    2015-01-01

    Drawing valid inferences from item response theory (IRT) models is contingent upon a good fit of the data to the model. Violations of model-data fit have numerous consequences, limiting the usefulness and applicability of the model. This instructional module provides an overview of methods used for evaluating the fit of IRT models. Upon completing…

  16. Magnetometer Response of Commonly Found Munitions Items and Munitions Surrogates

    Science.gov (United States)

    2012-01-12

    Predicted minimum magnetometer anomaly strength for a variety of munitions and surrogate items at a burial depth corresponding to 11x their respective...Response Live Site Demonstrations. The authors would like to thank Craig Murray of Parsons and Stephen Billings of Sky Research for their...variety of munitions and surrogate items at a burial depth corresponding to 11x their respective diameter. The sensor is assumed to be deployed as part

  17. MODERATING ABILITY OF ITEM RESPONSE THEORY THROUGH PRIOR GUESSING PARAMETER

    Directory of Open Access Journals (Sweden)

    Siow Hoo Leong

    2013-01-01

    Full Text Available A psycho-technology approach to discouraging guessing in multiple-choice formatted item can be done through reducing the a priori guessing probability of an item. This study proposes a psychometrics framework of Item Response Theory (IRT to model the effect of having various priori guessing probabilities across different items. A prior guessing parameter is proposed to serves as a moderator of the ability parameter in the two parameter logistic IRT. The results show that the proposed prior guessing parameter successfully moderates the ability parameters of the subjects with different degrees of guessing. However, the prior guessing parameter is insensitive when the performance pattern is mixed within the testlet but similar across testlet with different priori guessing probabilities.

  18. PENGEMBANGAN TES BERPIKIR KRITIS DENGAN PENDEKATAN ITEM RESPONSE THEORY

    Directory of Open Access Journals (Sweden)

    Fajrianthi Fajrianthi

    2016-06-01

    Full Text Available Penelitian ini bertujuan untuk menghasilkan sebuah alat ukur (tes berpikir kritis yang valid dan reliabel untuk digunakan, baik dalam lingkup pendidikan maupun kerja di Indonesia. Tahapan penelitian dilakukan berdasarkan tahap pengembangan tes menurut Hambleton dan Jones (1993. Kisi-kisi dan pembuatan butir didasarkan pada konsep dalam tes Watson-Glaser Critical Thinking Appraisal (WGCTA. Pada WGCTA, berpikir kritis terdiri dari lima dimensi yaitu Inference, Recognition Assumption, Deduction, Interpretation dan Evaluation of arguments. Uji coba tes dilakukan pada 1.453 peserta tes seleksi karyawan di Surabaya, Gresik, Tuban, Bojonegoro, Rembang. Data dikotomi dianalisis dengan menggunakan model IRT dengan dua parameter yaitu daya beda dan tingkat kesulitan butir. Analisis dilakukan dengan menggunakan program statistik Mplus versi 6.11 Sebelum melakukan analisis dengan IRT, dilakukan pengujian asumsi yaitu uji unidimensionalitas, independensi lokal dan Item Characteristic Curve (ICC. Hasil analisis terhadap 68 butir menghasilkan 15 butir dengan daya beda yang cukup baik dan tingkat kesulitan butir yang berkisar antara –4 sampai dengan 2.448. Sedikitnya jumlah butir yang berkualitas baik disebabkan oleh kelemahan dalam menentukan subject matter experts di bidang berpikir kritis dan pemilihan metode skoring. Kata kunci: Pengembangan tes, berpikir kritis, item response theory   DEVELOPING CRITICAL THINKING TEST UTILISING ITEM RESPONSE THEORY Abstract The present study was aimed to develop a valid and reliable instrument in assesing critical thinking which can be implemented both in educational and work settings in Indonesia. Following the Hambleton and Jones’s (1993 procedures on test development, the study developed the instrument by employing the concept of critical thinking from Watson-Glaser Critical Thinking Appraisal (WGCTA. The study included five dimensions of critical thinking as adopted from the WGCTA: Inference, Recognition

  19. The 12-item World Health Organization Disability Assessment Schedule II (WHO-DAS II: a nonparametric item response analysis

    Directory of Open Access Journals (Sweden)

    Fernandez Ana

    2010-05-01

    Full Text Available Abstract Background Previous studies have analyzed the psychometric properties of the World Health Organization Disability Assessment Schedule II (WHO-DAS II using classical omnibus measures of scale quality. These analyses are sample dependent and do not model item responses as a function of the underlying trait level. The main objective of this study was to examine the effectiveness of the WHO-DAS II items and their options in discriminating between changes in the underlying disability level by means of item response analyses. We also explored differential item functioning (DIF in men and women. Methods The participants were 3615 adult general practice patients from 17 regions of Spain, with a first diagnosed major depressive episode. The 12-item WHO-DAS II was administered by the general practitioners during the consultation. We used a non-parametric item response method (Kernel-Smoothing implemented with the TestGraf software to examine the effectiveness of each item (item characteristic curves and their options (option characteristic curves in discriminating between changes in the underliying disability level. We examined composite DIF to know whether women had a higher probability than men of endorsing each item. Results Item response analyses indicated that the twelve items forming the WHO-DAS II perform very well. All items were determined to provide good discrimination across varying standardized levels of the trait. The items also had option characteristic curves that showed good discrimination, given that each increasing option became more likely than the previous as a function of increasing trait level. No gender-related DIF was found on any of the items. Conclusions All WHO-DAS II items were very good at assessing overall disability. Our results supported the appropriateness of the weights assigned to response option categories and showed an absence of gender differences in item functioning.

  20. The Academic Medical Center Linear Disability Score (ALDS) item bank: item response theory analysis in a mixed patient population.

    Science.gov (United States)

    Holman, Rebecca; Weisscher, Nadine; Glas, Cees A W; Dijkgraaf, Marcel G W; Vermeulen, Marinus; de Haan, Rob J; Lindeboom, Robert

    2005-12-29

    Currently, there is a lot of interest in the flexible framework offered by item banks for measuring patient relevant outcomes. However, there are few item banks, which have been developed to quantify functional status, as expressed by the ability to perform activities of daily life. This paper examines the measurement properties of the Academic Medical Center linear disability score item bank in a mixed population. This paper uses item response theory to analyse data on 115 of 170 items from a total of 1002 respondents. These were: 551 (55%) residents of supported housing, residential care or nursing homes; 235 (23%) patients with chronic pain; 127 (13%) inpatients on a neurology ward following a stroke; and 89 (9%) patients suffering from Parkinson's disease. Of the 170 items, 115 were judged to be clinically relevant. Of these 115 items, 77 were retained in the item bank following the item response theory analysis. Of the 38 items that were excluded from the item bank, 24 had either been presented to fewer than 200 respondents or had fewer than 10% or more than 90% of responses in the category 'can carry out'. A further 11 items had different measurement properties for younger and older or for male and female respondents. Finally, 3 items were excluded because the item response theory model did not fit the data. The Academic Medical Center linear disability score item bank has promising measurement characteristics for the mixed patient population described in this paper. Further studies will be needed to examine the measurement properties of the item bank in other populations.

  1. The Academic Medical Center Linear Disability Score (ALDS) item bank: item response theory analysis in a mixed patient population

    Science.gov (United States)

    Holman, Rebecca; Weisscher, Nadine; Glas, Cees AW; Dijkgraaf, Marcel GW; Vermeulen, Marinus; de Haan, Rob J; Lindeboom, Robert

    2005-01-01

    Background Currently, there is a lot of interest in the flexible framework offered by item banks for measuring patient relevant outcomes. However, there are few item banks, which have been developed to quantify functional status, as expressed by the ability to perform activities of daily life. This paper examines the measurement properties of the Academic Medical Center linear disability score item bank in a mixed population. Methods This paper uses item response theory to analyse data on 115 of 170 items from a total of 1002 respondents. These were: 551 (55%) residents of supported housing, residential care or nursing homes; 235 (23%) patients with chronic pain; 127 (13%) inpatients on a neurology ward following a stroke; and 89 (9%) patients suffering from Parkinson's disease. Results Of the 170 items, 115 were judged to be clinically relevant. Of these 115 items, 77 were retained in the item bank following the item response theory analysis. Of the 38 items that were excluded from the item bank, 24 had either been presented to fewer than 200 respondents or had fewer than 10% or more than 90% of responses in the category 'can carry out'. A further 11 items had different measurement properties for younger and older or for male and female respondents. Finally, 3 items were excluded because the item response theory model did not fit the data. Conclusion The Academic Medical Center linear disability score item bank has promising measurement characteristics for the mixed patient population described in this paper. Further studies will be needed to examine the measurement properties of the item bank in other populations. PMID:16381611

  2. The Academic Medical Center Linear Disability Score (ALDS item bank: item response theory analysis in a mixed patient population

    Directory of Open Access Journals (Sweden)

    Vermeulen Marinus

    2005-12-01

    Full Text Available Abstract Background Currently, there is a lot of interest in the flexible framework offered by item banks for measuring patient relevant outcomes. However, there are few item banks, which have been developed to quantify functional status, as expressed by the ability to perform activities of daily life. This paper examines the measurement properties of the Academic Medical Center linear disability score item bank in a mixed population. Methods This paper uses item response theory to analyse data on 115 of 170 items from a total of 1002 respondents. These were: 551 (55% residents of supported housing, residential care or nursing homes; 235 (23% patients with chronic pain; 127 (13% inpatients on a neurology ward following a stroke; and 89 (9% patients suffering from Parkinson's disease. Results Of the 170 items, 115 were judged to be clinically relevant. Of these 115 items, 77 were retained in the item bank following the item response theory analysis. Of the 38 items that were excluded from the item bank, 24 had either been presented to fewer than 200 respondents or had fewer than 10% or more than 90% of responses in the category 'can carry out'. A further 11 items had different measurement properties for younger and older or for male and female respondents. Finally, 3 items were excluded because the item response theory model did not fit the data. Conclusion The Academic Medical Center linear disability score item bank has promising measurement characteristics for the mixed patient population described in this paper. Further studies will be needed to examine the measurement properties of the item bank in other populations.

  3. Analyzing Force Concept Inventory with Item Response Theory

    CERN Document Server

    Wang, Jing

    2010-01-01

    Item Response Theory (IRT) is a popular assessment method used in education measurement, which builds on an assumption of a probability framework connecting students' innate ability and their actual performances on test items. The model transforms students' raw test scores through a nonlinear regression process into a scaled proficiency rating, which can be used to compare results obtained with different test questions. IRT also provides a theoretical approach to address ceiling effect and guessing. We applied IRT to analyze the Force Concept Inventory (FCI). The data was collected from 2802 students taking intro level mechanics courses at The Ohio State University. The data was analyzed with a 3-parameter item response model for multiple choice questions. We describe the procedures of the analysis and discuss the results and the interpretations. The analysis outcomes are compiled to provide a detailed IRT measurement metric of the FCI, which can be easily referenced and used by teachers and researchers for a...

  4. An Alternative Three-Parameter Logistic Item Response Model.

    Science.gov (United States)

    Pashley, Peter J.

    Birnbaum's three-parameter logistic function has become a common basis for item response theory modeling, especially within situations where significant guessing behavior is evident. This model is formed through a linear transformation of the two-parameter logistic function in order to facilitate a lower asymptote. This paper discusses an…

  5. Multilevel Higher-Order Item Response Theory Models

    Science.gov (United States)

    Huang, Hung-Yu; Wang, Wen-Chung

    2014-01-01

    In the social sciences, latent traits often have a hierarchical structure, and data can be sampled from multiple levels. Both hierarchical latent traits and multilevel data can occur simultaneously. In this study, we developed a general class of item response theory models to accommodate both hierarchical latent traits and multilevel data. The…

  6. A short tutorial on item response theory in rheumatology

    NARCIS (Netherlands)

    Siemons, L.; Krishnan, E.

    2014-01-01

    OBJECTIVES: The aim is to familiarise physicians and researchers with the most important concepts of item response theory (IRT) and with its usefulness for improving test administration and data collection in health care. Special attention is given to the versatility of its use within the rheumatic

  7. A Framework for Dimensionality Assessment for Multidimensional Item Response Models

    Science.gov (United States)

    Svetina, Dubravka; Levy, Roy

    2014-01-01

    A framework is introduced for considering dimensionality assessment procedures for multidimensional item response models. The framework characterizes procedures in terms of their confirmatory or exploratory approach, parametric or nonparametric assumptions, and applicability to dichotomous, polytomous, and missing data. Popular and emerging…

  8. A Speeded Item Response Model with Gradual Process Change

    Science.gov (United States)

    Goegebeur, Yuri; De Boeck, Paul; Wollack, James A.; Cohen, Allan S.

    2008-01-01

    An item response theory model for dealing with test speededness is proposed. The model consists of two random processes, a problem solving process and a random guessing process, with the random guessing gradually taking over from the problem solving process. The involved change point and change rate are considered random parameters in order to…

  9. Testing Linear Models for Ability Parameters in Item Response Models

    NARCIS (Netherlands)

    Glas, Cees A.W.; Hendrawan, Irene

    2005-01-01

    Methods for testing hypotheses concerning the regression parameters in linear models for the latent person parameters in item response models are presented. Three tests are outlined: A likelihood ratio test, a Lagrange multiplier test and a Wald test. The tests are derived in a marginal maximum like

  10. Using SAS PROC MCMC for Item Response Theory Models

    Science.gov (United States)

    Ames, Allison J.; Samonte, Kelli

    2015-01-01

    Interest in using Bayesian methods for estimating item response theory models has grown at a remarkable rate in recent years. This attentiveness to Bayesian estimation has also inspired a growth in available software such as WinBUGS, R packages, BMIRT, MPLUS, and SAS PROC MCMC. This article intends to provide an accessible overview of Bayesian…

  11. Morphological Contributions to Adolescent Word Reading: An Item Response Approach

    Science.gov (United States)

    Goodwin, Amanda P.; Gilbert, Jennifer K.; Cho, Sun-Joo

    2013-01-01

    The current study uses a crossed random-effects item response model to simultaneously examine both reader and word characteristics and interactions between them that predict the reading of 39 morphologically complex words for 221 middle school students. Results suggest that a reader's ability to read a root word (e.g., "isolate") predicts that…

  12. The 18 Household Food Security Survey items provide valid food security classifications for adults and children in the Caribbean

    Directory of Open Access Journals (Sweden)

    Nunes Cheryl

    2006-02-01

    Full Text Available Abstract Background We tested the properties of the 18 Household Food Security Survey (HFSS items, and the validity of the resulting food security classifications, in an English-speaking middle-income country. Methods Survey of primary school children in Trinidad and Tobago. Parents completed the HFSS. Responses were analysed for the 10 adult-referenced items and the eight child-referenced items. Item response theory models were fitted. Item calibrations and subject scores from a one-parameter logistic (1PL model were compared with those from either two-parameter logistic model (2PL or a model for differential item functioning (DIF by ethnicity. Results There were 5219 eligible with 3858 (74% completing at least one food security item. Adult item calibrations (standard error in the 1PL model ranged from -4.082 (0.019 for the 'worried food would run out' item to 3.023 (0.042 for 'adults often do not eat for a whole day'. Child item calibrations ranged from -3.715 (0.025 for 'relied on a few kinds of low cost food' to 3.088 (0.039 for 'child didn't eat for a whole day'. Fitting either a 2PL model, which allowed discrimination parameters to vary between items, or a differential item functioning model, which allowed item calibrations to vary between ethnic groups, had little influence on interpretation. The classification based on the adult-referenced items showed that there were 19% of respondents who were food insecure without hunger, 10% food insecure with moderate hunger and 6% food insecure with severe hunger. The classification based on the child-referenced items showed that there were 23% of children who were food insecure without hunger and 9% food insecure with hunger. In both children and adults food insecurity showed a strong, graded association with lower monthly household income (P Conclusion These results support the use of 18 HFSS items to classify food security status of adults or children in an English-speaking country where food

  13. An item response theory analysis of the narcissistic personality inventory.

    Science.gov (United States)

    Ackerman, Robert A; Donnellan, M Brent; Robins, Richard W

    2012-01-01

    This research uses item response theory methods to evaluate the Narcissistic Personality Inventory (NPI; Raskin & Terry, 1988). Analyses using the 2-parameter logistic model were conducted on the total score and the Corry, Merritt, Mrug, and Pamp (2008) and Ackerman et al. (2011) subscales for the NPI. In addition to offering precise information about the psychometric properties of the NPI item pool, these analyses generated insights that can be used to develop new measures of the personality constructs embedded within this frequently used inventory.

  14. Functionally unidimensional item response models for multivariate binary data

    DEFF Research Database (Denmark)

    Ip, Edward; Molenberghs, Geert; Chen, Shyh-Huei;

    2013-01-01

    The problem of fitting unidimensional item response models to potentially multidimensional data has been extensively studied. The focus of this article is on response data that have a strong dimension but also contain minor nuisance dimensions. Fitting a unidimensional model to such multidimensio......The problem of fitting unidimensional item response models to potentially multidimensional data has been extensively studied. The focus of this article is on response data that have a strong dimension but also contain minor nuisance dimensions. Fitting a unidimensional model...... to such multidimensional data is believed to result in ability estimates that represent a combination of the major and minor dimensions. We conjecture that the underlying dimension for the fitted unidimensional model, which we call the functional dimension, represents a nonlinear projection. In this article we investigate...... tool. An example regarding a construct of desire for physical competency is used to illustrate the functional unidimensional approach....

  15. Dimensionality of the UWES-17: An item response modelling analysis

    Directory of Open Access Journals (Sweden)

    Deon P. de Bruin

    2013-03-01

    Full Text Available Orientation: Questionnaires, particularly the Utrecht Work Engagement Scale (UWES-17, are an almost standard method by which to measure work engagement. Conflicting evidence regarding the dimensionality of the UWES-17 has led to confusion regarding the interpretation of scores.Research purpose: The main focus of this study was to use the Rasch model to provide insight into the dimensionality of the UWES-17, and to assess whether work engagement should be interpreted as one single overall score, three separate scores, or a combination.Motivation for the study: It is unclear whether a summative score is more representative of work engagement or whether scores are more meaningful when interpreted for each dimension separately. Previous work relied on confirmatory factor analysis; the potential of item response models has not been tapped.Research design: A quantitative cross-sectional survey design approach was used. Participants, 2429 employees of a South African Information and Communication Technology (ICT company, completed the UWES-17.Main findings: Findings indicate that work engagement should be treated as a unidimensional construct: individual scores should be interpreted in a summative manner, giving a single global score.Practical/managerial implications: Users of the UWES-17 may interpret a single, summative score for work engagement. Findings of this study should also contribute towards standardising UWES-17 scores, allowing meaningful comparisons to be made.Contribution/value-add: The findings will benefit researchers, organisational consultants and managers. Clarity on dimensionality and interpretation of work engagement will assist researchers in future studies. Managers and consultants will be able to make better-informed decisions when using work engagement data.

  16. An item response curves analysis of the Force Concept Inventory

    Science.gov (United States)

    Morris, Gary A.; Harshman, Nathan; Branum-Martin, Lee; Mazur, Eric; Mzoughi, Taha; Baker, Stephen D.

    2012-09-01

    Several years ago, we introduced the idea of item response curves (IRC), a simplistic form of item response theory (IRT), to the physics education research community as a way to examine item performance on diagnostic instruments such as the Force Concept Inventory (FCI). We noted that a full-blown analysis using IRT would be a next logical step, which several authors have since taken. In this paper, we show that our simple approach not only yields similar conclusions in the analysis of the performance of items on the FCI to the more sophisticated and complex IRT analyses but also permits additional insights by characterizing both the correct and incorrect answer choices. Our IRC approach can be applied to a variety of multiple-choice assessments but, as applied to a carefully designed instrument such as the FCI, allows us to probe student understanding as a function of ability level through an examination of each answer choice. We imagine that physics teachers could use IRC analysis to identify prominent misconceptions and tailor their instruction to combat those misconceptions, fulfilling the FCI authors' original intentions for its use. Furthermore, the IRC analysis can assist test designers to improve their assessments by identifying nonfunctioning distractors that can be replaced with distractors attractive to students at various ability levels.

  17. Methodological note: allocation of disability items in the American Community Survey.

    Science.gov (United States)

    Siordia, Carlos; Young, Rebekah

    2013-04-01

    Determining the prevalence and correlates of disability requires the use of sample surveys in data analysis. In an effort to generate complete datasets, allocation procedures (i.e., the assignment of values to missing or illogical responses) are frequently used for missing or inconsistent responses. The goal of this investigation was to explore how six disability-related questions vary in their degree of allocation and how research results may be sensitive to this procedure. This is important because many researchers using large disability information banks are not survey methodologists and may be unaware of how the Census Bureau's editing procedures can influence research findings. We use 2010 1-year Public Use Microdata Sample files from the American Community Survey (ACS). We investigated the allocation rates of the following disability items: self-care; hearing; vision; independent living; ambulatory; and cognitive ability. We also asked how allocation rates varied by demographic characteristics and whether the allocated values could influence multivariate results. Disability item allocation in ACS data have detectable patterns, where the rate of disability allocation is higher for mail surveys, males, older people, groups who speak English not well or not at all, US citizens, Latinos(as), and for people living in or near poverty. Multivariate models may be sensitive to how these allocated values are treated. The rate of allocations varies as a function of demographic variables because of methodological procedures and survey participation behaviors. Because allocation rates may affect research and policy about the disabled population, more research is required. Copyright © 2013 Elsevier Inc. All rights reserved.

  18. Marginal Maximum Likelihood Estimation of Item Response Models in R

    Directory of Open Access Journals (Sweden)

    Matthew S. Johnson

    2007-02-01

    Full Text Available Item response theory (IRT models are a class of statistical models used by researchers to describe the response behaviors of individuals to a set of categorically scored items. The most common IRT models can be classified as generalized linear fixed- and/or mixed-effect models. Although IRT models appear most often in the psychological testing literature, researchers in other fields have successfully utilized IRT-like models in a wide variety of applications. This paper discusses the three major methods of estimation in IRT and develops R functions utilizing the built-in capabilities of the R environment to find the marginal maximum likelihood estimates of the generalized partial credit model. The currently available R packages ltm is also discussed.

  19. Introduction to bifactor polytomous item response theory analysis.

    Science.gov (United States)

    Toland, Michael D; Sulis, Isabella; Giambona, Francesca; Porcu, Mariano; Campbell, Jonathan M

    2017-02-01

    A bifactor item response theory model can be used to aid in the interpretation of the dimensionality of a multifaceted questionnaire that assumes continuous latent variables underlying the propensity to respond to items. This model can be used to describe the locations of people on a general continuous latent variable as well as on continuous orthogonal specific traits that characterize responses to groups of items. The bifactor graded response (bifac-GR) model is presented in contrast to a correlated traits (or multidimensional GR model) and unidimensional GR model. Bifac-GR model specification, assumptions, estimation, and interpretation are demonstrated with a reanalysis of data (Campbell, 2008) on the Shared Activities Questionnaire. We also show the importance of marginalizing the slopes for interpretation purposes and we extend the concept to the interpretation of the information function. To go along with the illustrative example analyses, we have made available supplementary files that include command file (syntax) examples and outputs from flexMIRT, IRTPRO, R, Mplus, and STATA. Supplementary data to this article can be found online at http://dx.doi.org/10.1016/j.jsp.2016.11.001. Data needed to reproduce analyses in this article are available as supplemental materials (online only) in the Appendix of this article. Copyright © 2016 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.

  20. Item Response Theory with Covariates (IRT-C): Assessing Item Recovery and Differential Item Functioning for the Three-Parameter Logistic Model

    Science.gov (United States)

    Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.

    2016-01-01

    In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…

  1. Controlling for rater effects when comparing survey items with incomplete Likert data.

    Science.gov (United States)

    Schulz, E M; Sun, A

    2001-01-01

    The rating scale model (Andrich, 1978) was applied to data from a survey that directed students to rate their satisfaction with college services on a five point Likert scale. Because students used different services, and students were directed to rate only the services they used, the items were differentially exposed to a person factor that we call "pleasability." Differential exposure to pleasability makes items' average rating a biased measure of their performance. In contrast, item parameter estimates in the rating scale model corrected for differential exposure to pleasability. Compared to items' average ratings, item parameter estimates in the rating scale model did a better job of predicting which item received the higher rating when any two items were rated by the same rater.

  2. The use of predicted values for item parameters in item response theory models: An application in intelligence tests

    NARCIS (Netherlands)

    Matteucci, M.; S. Mignani, Prof.; Veldkamp, Bernard P.

    2012-01-01

    In testing, item response theory models are widely used in order to estimate item parameters and individual abilities. However, even unidimensional models require a considerable sample size so that all parameters can be estimated precisely. The introduction of empirical prior information about candi

  3. Bookmark locations and item response model selection in the presence of local item dependence.

    Science.gov (United States)

    Skaggs, Gary

    2007-01-01

    The bookmark standard setting procedure is a popular method for setting performance standards on state assessment programs. This study reanalyzed data from an application of the bookmark procedure to a passage-based test that used the Rasch model to create the item ordered booklet. Several problems were noted in this implementation of the bookmark procedure, including disagreement among the SMEs about the correct order of items in the bookmark booklet, performance level descriptions of the passing standard being based on passage difficulty as well as item difficulty, and the presence of local item dependence within reading passages. Bookmark item locations were recalculated for the IRT three-parameter model and the multidimensional bifactor model. The results showed that the order of item locations was very similar for all three models when items of high difficulty and low discrimination were excluded. However, the items whose positions were the most discrepant between models were not the items that the SMEs disagreed about the most in the original standard setting. The choice of latent trait model did not address problems of item order disagreement. Implications for the use of the bookmark method in the presence of local item dependence are discussed.

  4. A Model-Free Diagnostic for Single-Peakedness of Item Responses Using Ordered Conditional Means

    Science.gov (United States)

    Polak, Marike; De Rooij, Mark; Heiser, Willem J.

    2012-01-01

    In this article we propose a model-free diagnostic for single-peakedness (unimodality) of item responses. Presuming a unidimensional unfolding scale and a given item ordering, we approximate item response functions of all items based on ordered conditional means (OCM). The proposed OCM methodology is based on Thurstone & Chave's (1929) "criterion…

  5. Mild to severe social fears: ranking types of feared social situations using item response theory.

    Science.gov (United States)

    Crome, Erica; Baillie, Andrew

    2014-06-01

    Social anxiety disorder is one of the most common mental disorders, and is associated with long term impairment, distress and vulnerability to secondary disorders. Certain types of social fears are more common than others, with public speaking fears typically the most prevalent in epidemiological surveys. The distinction between performance- and interaction-based fears has been the focus of long-standing debate in the literature, with evidence performance-based fears may reflect more mild presentations of social anxiety. This study aims to explicitly test whether different types of social fears differ in underlying social anxiety severity using item response theory techniques. Different types of social fears were assessed using items from three different structured diagnostic interviews in four different epidemiological surveys in the United States (n=2261, n=5411) and Australia (n=1845, n=1497); and ranked using 2-parameter logistic item response theory models. Overall, patterns of underlying severity indicated by different fears were consistent across the four samples with items functioning across a range of social anxiety. Public performance fears and speaking at meetings/classes indicated the lowest levels of social anxiety, with increasing severity indicated by situations such as being assertive or attending parties. Fears of using public bathrooms or eating, drinking or writing in public reflected the highest levels of social anxiety. Understanding differences in the underlying severity of different types of social fears has important implications for the underlying structure of social anxiety, and may also enhance the delivery of social anxiety treatment at a population level.

  6. Item response theory analysis of the life orientation test-revised: age and gender differential item functioning analyses.

    Science.gov (United States)

    Steca, Patrizia; Monzani, Dario; Greco, Andrea; Chiesi, Francesca; Primi, Caterina

    2015-06-01

    This study is aimed at testing the measurement properties of the Life Orientation Test-Revised (LOT-R) for the assessment of dispositional optimism by employing item response theory (IRT) analyses. The LOT-R was administered to a large sample of 2,862 Italian adults. First, confirmatory factor analyses demonstrated the theoretical conceptualization of the construct measured by the LOT-R as a single bipolar dimension. Subsequently, IRT analyses for polytomous, ordered response category data were applied to investigate the items' properties. The equivalence of the items across gender and age was assessed by analyzing differential item functioning. Discrimination and severity parameters indicated that all items were able to distinguish people with different levels of optimism and adequately covered the spectrum of the latent trait. Additionally, the LOT-R appears to be gender invariant and, with minor exceptions, age invariant. Results provided evidence that the LOT-R is a reliable and valid measure of dispositional optimism.

  7. Generalized Fiducial Inference for Binary Logistic Item Response Models.

    Science.gov (United States)

    Liu, Yang; Hannig, Jan

    2016-06-01

    Generalized fiducial inference (GFI) has been proposed as an alternative to likelihood-based and Bayesian inference in mainstream statistics. Confidence intervals (CIs) can be constructed from a fiducial distribution on the parameter space in a fashion similar to those used with a Bayesian posterior distribution. However, no prior distribution needs to be specified, which renders GFI more suitable when no a priori information about model parameters is available. In the current paper, we apply GFI to a family of binary logistic item response theory models, which includes the two-parameter logistic (2PL), bifactor and exploratory item factor models as special cases. Asymptotic properties of the resulting fiducial distribution are discussed. Random draws from the fiducial distribution can be obtained by the proposed Markov chain Monte Carlo sampling algorithm. We investigate the finite-sample performance of our fiducial percentile CI and two commonly used Wald-type CIs associated with maximum likelihood (ML) estimation via Monte Carlo simulation. The use of GFI in high-dimensional exploratory item factor analysis was illustrated by the analysis of a set of the Eysenck Personality Questionnaire data.

  8. Adult Attachment Ratings (AAR): an item response theory analysis.

    Science.gov (United States)

    Pilkonis, Paul A; Kim, Yookyung; Yu, Lan; Morse, Jennifer Q

    2014-01-01

    The Adult Attachment Ratings (AAR) include 3 scales for anxious, ambivalent attachment (excessive dependency, interpersonal ambivalence, and compulsive care-giving), 3 for avoidant attachment (rigid self-control, defensive separation, and emotional detachment), and 1 for secure attachment. The scales include items (ranging from 6-16 in their original form) scored by raters using a 3-point format (0 = absent, 1 = present, and 2 = strongly present) and summed to produce a total score. Item response theory (IRT) analyses were conducted with data from 414 participants recruited from psychiatric outpatient, medical, and community settings to identify the most informative items from each scale. The IRT results allowed us to shorten the scales to 5-item versions that are more precise and easier to rate because of their brevity. In general, the effective range of measurement for the scales was 0 to +2 SDs for each of the attachment constructs; that is, from average to high levels of attachment problems. Evidence for convergent and discriminant validity of the scales was investigated by comparing them with the Experiences of Close Relationships-Revised (ECR-R) scale and the Kobak Attachment Q-sort. The best consensus among self-reports on the ECR-R, informant ratings on the ECR-R, and expert judgments on the Q-sort and the AAR emerged for anxious, ambivalent attachment. Given the good psychometric characteristics of the scale for secure attachment, however, this measure alone might provide a simple alternative to more elaborate procedures for some measurement purposes. Conversion tables are provided for the 7 scales to facilitate transformation from raw scores to IRT-calibrated (theta) scores.

  9. Students' proficiency scores within multitrait item response theory

    Science.gov (United States)

    Scott, Terry F.; Schumayer, Daniel

    2015-12-01

    In this paper we present a series of item response models of data collected using the Force Concept Inventory. The Force Concept Inventory (FCI) was designed to poll the Newtonian conception of force viewed as a multidimensional concept, that is, as a complex of distinguishable conceptual dimensions. Several previous studies have developed single-trait item response models of FCI data; however, we feel that multidimensional models are also appropriate given the explicitly multidimensional design of the inventory. The models employed in the research reported here vary in both the number of fitting parameters and the number of underlying latent traits assumed. We calculate several model information statistics to ensure adequate model fit and to determine which of the models provides the optimal balance of information and parsimony. Our analysis indicates that all item response models tested, from the single-trait Rasch model through to a model with ten latent traits, satisfy the standard requirements of fit. However, analysis of model information criteria indicates that the five-trait model is optimal. We note that an earlier factor analysis of the same FCI data also led to a five-factor model. Furthermore the factors in our previous study and the traits identified in the current work match each other well. The optimal five-trait model assigns proficiency scores to all respondents for each of the five traits. We construct a correlation matrix between the proficiencies in each of these traits. This correlation matrix shows strong correlations between some proficiencies, and strong anticorrelations between others. We present an interpretation of this correlation matrix.

  10. Development of the Quantitative Reasoning Items on the National Survey of Student Engagement

    Directory of Open Access Journals (Sweden)

    Amber D. Dumford

    2015-01-01

    Full Text Available As society’s needs for quantitative skills become more prevalent, college graduates require quantitative skills regardless of their career choices. Therefore, it is important that institutions assess students’ engagement in quantitative activities during college. This study chronicles the process taken by the National Survey of Student Engagement (NSSE to develop items that measure students’ participation in quantitative reasoning (QR activities. On the whole, findings across the quantitative and qualitative analyses suggest good overall properties for the developed QR items. The items show great promise to explore and evaluate the frequency with which college students participate in QR-related activities. Each year, hundreds of institutions across the United States and Canada participate in NSSE, and, with the addition of these new items on the core survey, every participating institution will have information on this topic. Our hope is that these items will spur conversations on campuses about students’ use of quantitative reasoning activities.

  11. Pattern analysis of total item score and item response of the Kessler Screening Scale for Psychological Distress (K6) in a nationally representative sample of US adults.

    Science.gov (United States)

    Tomitaka, Shinichiro; Kawasaki, Yohei; Ide, Kazuki; Akutagawa, Maiko; Yamada, Hiroshi; Yutaka, Ono; Furukawa, Toshiaki A

    2017-01-01

    Several recent studies have shown that total scores on depressive symptom measures in a general population approximate an exponential pattern except for the lower end of the distribution. Furthermore, we confirmed that the exponential pattern is present for the individual item responses on the Center for Epidemiologic Studies Depression Scale (CES-D). To confirm the reproducibility of such findings, we investigated the total score distribution and item responses of the Kessler Screening Scale for Psychological Distress (K6) in a nationally representative study. Data were drawn from the National Survey of Midlife Development in the United States (MIDUS), which comprises four subsamples: (1) a national random digit dialing (RDD) sample, (2) oversamples from five metropolitan areas, (3) siblings of individuals from the RDD sample, and (4) a national RDD sample of twin pairs. K6 items are scored using a 5-point scale: "none of the time," "a little of the time," "some of the time," "most of the time," and "all of the time." The pattern of total score distribution and item responses were analyzed using graphical analysis and exponential regression model. The total score distributions of the four subsamples exhibited an exponential pattern with similar rate parameters. The item responses of the K6 approximated a linear pattern from "a little of the time" to "all of the time" on log-normal scales, while "none of the time" response was not related to this exponential pattern. The total score distribution and item responses of the K6 showed exponential patterns, consistent with other depressive symptom scales.

  12. Modeling Local Item Dependence in Cloze and Reading Comprehension Test Items Using Testlet Response Theory

    Science.gov (United States)

    Baghaei, Purya; Ravand, Hamdollah

    2016-01-01

    In this study the magnitudes of local dependence generated by cloze test items and reading comprehension items were compared and their impact on parameter estimates and test precision was investigated. An advanced English as a foreign language reading comprehension test containing three reading passages and a cloze test was analyzed with a…

  13. Using the Nominal Response Model to Evaluate Response Category Discrimination in the PROMIS Emotional Distress Item Pools

    Science.gov (United States)

    Preston, Kathleen; Reise, Steven; Cai, Li; Hays, Ron D.

    2011-01-01

    The authors used a nominal response item response theory model to estimate category boundary discrimination (CBD) parameters for items drawn from the Emotional Distress item pools (Depression, Anxiety, and Anger) developed in the Patient-Reported Outcomes Measurement Information Systems (PROMIS) project. For polytomous items with ordered response…

  14. Functionally Unidimensional Item Response Models for Multivariate Binary Data.

    Science.gov (United States)

    Ip, Edward H; Molenberghs, Geert; Chen, Shyh-Huei; Goegebeur, Yuri; De Boeck, Paul

    2013-07-01

    The problem of fitting unidimensional item response models to potentially multidimensional data has been extensively studied. The focus of this article is on response data that have a strong dimension but also contain minor nuisance dimensions. Fitting a unidimensional model to such multidimensional data is believed to result in ability estimates that represent a combination of the major and minor dimensions. We conjecture that the underlying dimension for the fitted unidimensional model, which we call the functional dimension, represents a nonlinear projection. In this article we investigate 2 issues: (a) can a proposed nonlinear projection track the functional dimension well, and (b) what are the biases in the ability estimate and the associated standard error when estimating the functional dimension? To investigate the second issue, the nonlinear projection is used as an evaluative tool. An example regarding a construct of desire for physical competency is used to illustrate the functional unidimensional approach.

  15. [Unfolding item response model using best-worst scaling].

    Science.gov (United States)

    Ikehara, Kazuya

    2015-02-01

    In attitude measurement and sensory tests, the unfolding model is typically used. In this model, response probability is formulated by the distance between the person and the stimulus. In this study, we proposed an unfolding item response model using best-worst scaling (BWU model), in which a person chooses the best and worst stimulus among repeatedly presented subsets of stimuli. We also formulated an unfolding model using best scaling (BU model), and compared the accuracy of estimates between the BU and BWU models. A simulation experiment showed that the BWU modell performed much better than the BU model in terms of bias and root mean square errors of estimates. With reference to Usami (2011), the proposed models were apllied to actual data to measure attitudes toward tardiness. Results indicated high similarity between stimuli estimates generated with the proposed models and those of Usami (2011).

  16. Updated U.S. population standard for the Veterans RAND 12-item Health Survey (VR-12).

    Science.gov (United States)

    Selim, Alfredo J; Rogers, William; Fleishman, John A; Qian, Shirley X; Fincke, Benjamin G; Rothendler, James A; Kazis, Lewis E

    2009-02-01

    The purpose of this project was to develop an updated U.S. population standard for the Veterans RAND 12-item Health Survey (VR-12). We used a well-defined and nationally representative sample of the U.S. population from 52,425 responses to the Medical Expenditure Panel Survey (MEPS) collected between 2000 and 2002. We applied modified regression estimates to update the non-proprietary 1990 scoring algorithms. We applied the updated standard to the Medicare Health Outcomes Survey (HOS) to compute the VR-12 physical (PCS((MEPS standard))) and mental (MCS((MEPS standard))) component summaries based on the MEPS. We compared these scores to PCS and MCS based on the 1990 U.S. population standard. Using the updated U.S. population standard, the average VR-12 PCS((MEPS standard)) and MCS((MEPS standard)) scores in the Medicare HOS were 39.82 (standard deviation [SD] = 12.2) and 50.08 (SD = 11.4), respectively. For the same Medicare HOS, the average PCS and MCS scores based on the 1990 standard were 1.40 points higher and 0.99 points lower in comparison to VR-12 PCS and MCS, respectively. Changes in the U.S. population between 1990 and today make the old standard obsolete for the VR-12, so the updated standard developed here is widely available to serve as such a contemporary standard for future applications for health-related quality of life (HRQoL) assessments.

  17. Influence of Item Direction on Student Responses in Attitude Assessment.

    Science.gov (United States)

    Campbell, Noma Jo; Grissom, Stephen

    To investigate the effects of wording in attitude test items, a five-point Likert-type rating scale was administered to 173 undergraduate education majors. The test measured attitudes toward college and self, and contained 38 positively-worded items. Thirty-eight negatively-worded items were also written to parallel the positive statements.…

  18. Improving Item Response Theory Model Calibration by Considering Response Times in Psychological Tests

    Science.gov (United States)

    Ranger, Jochen; Kuhn, Jorg-Tobias

    2012-01-01

    Research findings indicate that response times in personality scales are related to the trait level according to the so-called speed-distance hypothesis. Against this background, Ferrando and Lorenzo-Seva proposed a latent trait model for the responses and response times in a test. The model consists of two components, a standard item response…

  19. Is Bloom's Taxonomy reflected in the response pattern to MCQ items?

    Science.gov (United States)

    Huxham, G J; Naeraa, N

    1980-01-01

    The purpose of this study was to find out whether taxonomic classification of MCQ items reflected differences in student behaviour. The data from one of this University's official open-book exams, in which students answer sixty MCQ items distributed over twelve content-areas of physiology were examined. The responses from all 153 candidates were then subjected to factor analysis. Analysis of individual item scores was unrewarding. Analysis of scores for item-groups based on taxonomy and content resulted in the identification of three factors, which carried predominant loadings from recall or look-up items, interpretation items, and problem-solving items, respectively.

  20. Development and validation of an item response theory-based Social Responsiveness Scale short form.

    Science.gov (United States)

    Sturm, Alexandra; Kuhfeld, Megan; Kasari, Connie; McCracken, James T

    2017-09-01

    Research and practice in autism spectrum disorder (ASD) rely on quantitative measures, such as the Social Responsiveness Scale (SRS), for characterization and diagnosis. Like many ASD diagnostic measures, SRS scores are influenced by factors unrelated to ASD core features. This study further interrogates the psychometric properties of the SRS using item response theory (IRT), and demonstrates a strategy to create a psychometrically sound short form by applying IRT results. Social Responsiveness Scale analyses were conducted on a large sample (N = 21,426) of youth from four ASD databases. Items were subjected to item factor analyses and evaluation of item bias by gender, age, expressive language level, behavior problems, and nonverbal IQ. Item selection based on item psychometric properties, DIF analyses, and substantive validity produced a reduced item SRS short form that was unidimensional in structure, highly reliable (α = .96), and free of gender, age, expressive language, behavior problems, and nonverbal IQ influence. The short form also showed strong relationships with established measures of autism symptom severity (ADOS, ADI-R, Vineland). Degree of association between all measures varied as a function of expressive language. Results identified specific SRS items that are more vulnerable to non-ASD-related traits. The resultant 16-item SRS short form may possess superior psychometric properties compared to the original scale and emerge as a more precise measure of ASD core symptom severity, facilitating research and practice. Future research using IRT is needed to further refine existing measures of autism symptomatology. © 2017 Association for Child and Adolescent Mental Health.

  1. Reevaluation of the Amsterdam Inventory for Auditory Disability and Handicap Using Item Response Theory

    Science.gov (United States)

    Hospers, J. Mirjam Boeschen; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B.; Kramer, Sophia E.

    2016-01-01

    Purpose: We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Method: Cross-sectional data from 2,352 adults with and without hearing…

  2. Limits on Log Odds Ratios for Unidimensional Item Response Theory Models

    Science.gov (United States)

    Haberman, Shelby J.; Holland, Paul W.; Sinharay, Sandip

    2007-01-01

    Bounds are established for log odds ratios (log cross-product ratios) involving pairs of items for item response models. First, expressions for bounds on log odds ratios are provided for one-dimensional item response models in general. Then, explicit bounds are obtained for the Rasch model and the two-parameter logistic (2PL) model. Results are…

  3. Reevaluation of the Amsterdam Inventory for Auditory Disability and Handicap Using Item Response Theory

    Science.gov (United States)

    Hospers, J. Mirjam Boeschen; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B.; Kramer, Sophia E.

    2016-01-01

    Purpose: We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Method: Cross-sectional data from 2,352 adults with and without hearing…

  4. Estimation of Item Response Theory Parameters in the Presence of Missing Data

    Science.gov (United States)

    Finch, Holmes

    2008-01-01

    Missing data are a common problem in a variety of measurement settings, including responses to items on both cognitive and affective assessments. Researchers have shown that such missing data may create problems in the estimation of item difficulty parameters in the Item Response Theory (IRT) context, particularly if they are ignored. At the same…

  5. Resolving Dimensionality Problems With WHOQOL-BREF Item Responses.

    Science.gov (United States)

    Perera, Harsha N; Izadikhah, Zahra; O'Connor, Peter; McIlveen, Peter

    2016-11-20

    The World Health Organization Quality of Life Scale (WHOQOL-BREF) is predicated on a multidimensional perspective on quality of life (QOL); yet studies are unclear about the latent structure underlying responses. This article reports on a study conducted to investigate the structure of WHOQOL-BREF scores. Competing latent structures of the data were examined in a general population sample. In addition, the complete factorial invariance of the retained model was investigated across gender. We also investigated latent mean differences in the QOL dimensions over age as well as age by gender interactions effects. Based on responses to the WHOQOL-BREF, support was found for a bifactor exploratory structural equation modeling representation of the data. This measurement structure accounts for construct-relevant multidimensionality in item responses due to the presence of general and specific factors underlying the data and the fallibility of indictors as pure reflections of only the single constructs they are purported to measure. Furthermore, support was found for measurement and structural invariance across gender. Finally, evidence was obtained for a curvilinear relationship of age with QOL, characterized by a midlife nadir. Taken together, the results of the study yield important validation data for the WHOQOL-BREF and tentatively resolve the dimensionality issues in the measurement of QOL using this instrument.

  6. Determining differential item functioning and its effect on the test scores of selected pib indexes, using item response theory techniques

    Directory of Open Access Journals (Sweden)

    Pieter Schaap

    2001-02-01

    Full Text Available The objective of this article is to present the results of an investigation into the item and test characteristics of two tests of the Potential Index Batteries (PIB in terms of differential item functioning (DIP and the effect thereof on test scores of different race groups. The English Vocabulary (Index 12 and Spelling Tests (Index 22 of the PIB were analysed for white, black and coloured South Africans. Item response theory (IRT methods were used to identify items which function differentially for white, black and coloured race groups. Opsomming Die doel van hierdie artikel is om die resultate van n ondersoek na die item- en toetseienskappe van twee PIB (Potential Index Batteries toetse in terme van itemsydigheid en die invloed wat dit op die toetstellings van rassegroepe het, weer te gee. Die Potential Index Batteries (PIB se Engelse Woordeskat (Index 12 en Spellingtoetse (Index 22 is ten opsigte van blanke, swart en gekleurde Suid-Afrikaners ontleed. Itemresponsteorie (IRT is gebruik om items te identifiseer wat as sydig (DIP vir die onderskeie rassegroepe beskou kan word.

  7. Item response theory-based measure of global disability in multiple sclerosis derived from the Performance Scales and related items.

    Science.gov (United States)

    Chamot, Eric; Kister, Ilya; Cutter, Gary R

    2014-10-03

    The eight Performance Scales and three assimilated scales (PS) used in North American Research Committee on Multiple Sclerosis (NARCOMS) registry surveys cover a broad range of neurologic domains commonly affected by multiple sclerosis (mobility, hand function, vision, fatigue, cognition, bladder/bowel, sensory, spasticity, pain, depression, and tremor/coordination). Each scale consists of a single 6-to-7-point Likert item with response categories ranging from "normal" to "total disability". Relatively little is known about the performances of the summary index of disability derived from these scales (the Performance Scales Sum or PSS). In this study, we demonstrate the value of a combination of classical and modern methods recently proposed by the Patient-Reported Outcome Measurement Information System (PROMIS) network to evaluate the psychometric properties of the PSS and derive an improved measure of global disability from the PS. The study sample included 7,851adults with MS who completed a NARCOMS intake questionnaire between 2003 and 2011. Factor analysis, bifactor modeling, and item response theory (IRT) analysis were used to evaluate the dimension(s) of disability underlying the PS; calibrate the 11 scales; and generate three alternative summary scores of global disability corresponding to different model assumptions and practical priorities. The construct validity of the three scores was compared by examining the magnitude of their associations with participant's background characteristics, including unemployment. We derived structurally valid measures of global disability from the PS through the proposed methodology that were superior to the PSS. The measure most applicable to clinical practice gives similar weight to physical and mental disability. Overall reliability of the new measure is acceptable for individual comparisons (0.87). Higher scores of global disability were significantly associated with older age at assessment, longer disease duration

  8. Statistical tests of conditional independence between responses and/or response times on test items

    NARCIS (Netherlands)

    van der Linden, Willem J.; Glas, Cornelis A.W.

    2010-01-01

    Three plausible assumptions of conditional independence in a hierarchical model for responses and response times on test items are identified. For each of the assumptions, a Lagrange multiplier test of the null hypothesis of conditional independence against a parametric alternative is derived. The t

  9. Employment of Item Response Theory to measure change in Children's Analogical Thinking Modifiability Test

    OpenAIRE

    Queiroz,Odoisa Antunes de; Primi,Ricardo; Carvalho,Lucas de Francisco; Enumo,Sônia Regina Fiorim

    2013-01-01

    Dynamic testing, with an intermediate phase of assistance, measures changes between pretest and post-test assuming a common metric between them. To test this assumption we applied the Item Response Theory in the responses of 69 children to dynamic cognitive testing Children's Analogical Thinking Modifiability Test adapted, with 12 items, totaling 828 responses, with the purpose of verifying if the original scale yields the same results as the equalized scale obtained by Item Response Theory i...

  10. A Comparison of Item Parameter Standard Error Estimation Procedures for Unidimensional and Multidimensional Item Response Theory Modeling

    Science.gov (United States)

    Paek, Insu; Cai, Li

    2014-01-01

    The present study was motivated by the recognition that standard errors (SEs) of item response theory (IRT) model parameters are often of immediate interest to practitioners and that there is currently a lack of comparative research on different SE (or error variance-covariance matrix) estimation procedures. The present study investigated item…

  11. Estimating Non-Normal Latent Trait Distributions within Item Response Theory Using True and Estimated Item Parameters

    Science.gov (United States)

    Sass, D. A.; Schmitt, T. A.; Walker, C. M.

    2008-01-01

    Item response theory (IRT) procedures have been used extensively to study normal latent trait distributions and have been shown to perform well; however, less is known concerning the performance of IRT with non-normal latent trait distributions. This study investigated the degree of latent trait estimation error under normal and non-normal…

  12. A Comparison of Item Parameter Standard Error Estimation Procedures for Unidimensional and Multidimensional Item Response Theory Modeling

    Science.gov (United States)

    Paek, Insu; Cai, Li

    2014-01-01

    The present study was motivated by the recognition that standard errors (SEs) of item response theory (IRT) model parameters are often of immediate interest to practitioners and that there is currently a lack of comparative research on different SE (or error variance-covariance matrix) estimation procedures. The present study investigated item…

  13. Estimating Non-Normal Latent Trait Distributions within Item Response Theory Using True and Estimated Item Parameters

    Science.gov (United States)

    Sass, D. A.; Schmitt, T. A.; Walker, C. M.

    2008-01-01

    Item response theory (IRT) procedures have been used extensively to study normal latent trait distributions and have been shown to perform well; however, less is known concerning the performance of IRT with non-normal latent trait distributions. This study investigated the degree of latent trait estimation error under normal and non-normal…

  14. Analysis Test of Understanding of Vectors with the Three-Parameter Logistic Model of Item Response Theory and Item Response Curves Technique

    Science.gov (United States)

    Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

    2016-01-01

    This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming…

  15. Validity of Suicidality Items from the Youth Risk Behavior Survey in a High School Sample

    Science.gov (United States)

    May, Alexis; Klonsky, E. David

    2011-01-01

    The Youth Risk Behavior Survey (YRBS) is used by the United States Centers for Disease Control to estimate rates of suicidal thoughts and behaviors in adolescents. This study investigated the validity of the YRBS suicidality items by examining their relationship to criterion variables including loneliness, anxiety, depression, substance use, and…

  16. Difference in method of administration did not significantly impact item response

    DEFF Research Database (Denmark)

    Bjorner, Jakob B; Rose, Matthias; Gandek, Barbara

    2014-01-01

    PURPOSE: To test the impact of method of administration (MOA) on the measurement characteristics of items developed in the Patient-Reported Outcomes Measurement Information System (PROMIS). METHODS: Two non-overlapping parallel 8-item forms from each of three PROMIS domains (physical function, fa...... levels in IVR, PQ, or PDA administration as compared to PC. Availability of large item response theory-calibrated PROMIS item banks allowed for innovations in study design and analysis....

  17. Practical methods for dealing with 'not applicable' item responses in the AMC Linear Disability Score project

    Directory of Open Access Journals (Sweden)

    Zwinderman Aeilko H

    2004-06-01

    Full Text Available Abstract Background Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included in the AMC Linear Disability Score (ALDS project item bank. Methods The data examined in this paper come from the responses of 392 respondents to 32 items and form part of the calibration sample for the ALDS item bank. The data are analysed using the one-parameter logistic item response theory model. The four practical strategies for dealing with this type of response are: cold deck imputation; hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. Results The item and respondent population parameter estimates were very similar for the strategies involving hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. The estimates obtained using the cold deck imputation method were substantially different. Conclusions The cold deck imputation method was not considered suitable for use in the ALDS item bank. The other three methods described can be usefully implemented in the ALDS item bank, depending on the purpose of the data analysis to be carried out. These three methods may be useful for other data sets examining similar constructs, when item response theory based methods are used.

  18. Practical methods for dealing with 'not applicable' item responses in the AMC Linear Disability Score project

    Science.gov (United States)

    Holman, Rebecca; Glas, Cees AW; Lindeboom, Robert; Zwinderman, Aeilko H; de Haan, Rob J

    2004-01-01

    Background Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included in the AMC Linear Disability Score (ALDS) project item bank. Methods The data examined in this paper come from the responses of 392 respondents to 32 items and form part of the calibration sample for the ALDS item bank. The data are analysed using the one-parameter logistic item response theory model. The four practical strategies for dealing with this type of response are: cold deck imputation; hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. Results The item and respondent population parameter estimates were very similar for the strategies involving hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. The estimates obtained using the cold deck imputation method were substantially different. Conclusions The cold deck imputation method was not considered suitable for use in the ALDS item bank. The other three methods described can be usefully implemented in the ALDS item bank, depending on the purpose of the data analysis to be carried out. These three methods may be useful for other data sets examining similar constructs, when item response theory based methods are used. PMID:15200681

  19. Quantifying Local, Response Dependence between Two Polytomous Items Using the Rasch Model

    Science.gov (United States)

    Andrich, David; Humphry, Stephen M.; Marais, Ida

    2012-01-01

    Models of modern test theory imply statistical independence among responses, generally referred to as "local independence." One violation of local independence occurs when the response to one item governs the response to a subsequent item. Expanding on a formulation of this kind of violation as a process in the dichotomous Rasch model,…

  20. The Influence of Item Response Indecision on the Self-Directed Search

    Science.gov (United States)

    Sampson, James P., Jr.; Shy, Jonathan D.; Hartley, Sarah Lucas; Reardon, Robert C.; Peterson, Gary W.

    2009-01-01

    Students (N = 247) responded to Self-Directed Search (SDS) per the standard response format and were also instructed to record a question mark (?) for items about which they were uncertain (item response indecision [IRI]). The initial responses of the 114 participants with a (?) were then reversed and a second SDS summary code was obtained and…

  1. Revisiting the 4-Parameter Item Response Model: Bayesian Estimation and Application.

    Science.gov (United States)

    Culpepper, Steven Andrew

    2016-12-01

    There has been renewed interest in Barton and Lord's (An upper asymptote for the three-parameter logistic item response model (Tech. Rep. No. 80-20). Educational Testing Service, 1981) four-parameter item response model. This paper presents a Bayesian formulation that extends Béguin and Glas (MCMC estimation and some model fit analysis of multidimensional IRT models. Psychometrika, 66 (4):541-561, 2001) and proposes a model for the four-parameter normal ogive (4PNO) model. Monte Carlo evidence is presented concerning the accuracy of parameter recovery. The simulation results support the use of less informative uniform priors for the lower and upper asymptotes, which is an advantage to prior research. Monte Carlo results provide some support for using the deviance information criterion and [Formula: see text] index to choose among models with two, three, and four parameters. The 4PNO is applied to 7491 adolescents' responses to a bullying scale collected under the 2005-2006 Health Behavior in School-Aged Children study. The results support the value of the 4PNO to estimate lower and upper asymptotes in large-scale surveys.

  2. Predicting survey responses: how and why semantics shape survey statistics on organizational behaviour.

    Science.gov (United States)

    Arnulf, Jan Ketil; Larsen, Kai Rune; Martinsen, Øyvind Lund; Bong, Chih How

    2014-01-01

    Some disciplines in the social sciences rely heavily on collecting survey responses to detect empirical relationships among variables. We explored whether these relationships were a priori predictable from the semantic properties of the survey items, using language processing algorithms which are now available as new research methods. Language processing algorithms were used to calculate the semantic similarity among all items in state-of-the-art surveys from Organisational Behaviour research. These surveys covered areas such as transformational leadership, work motivation and work outcomes. This information was used to explain and predict the response patterns from real subjects. Semantic algorithms explained 60-86% of the variance in the response patterns and allowed remarkably precise prediction of survey responses from humans, except in a personality test. Even the relationships between independent and their purported dependent variables were accurately predicted. This raises concern about the empirical nature of data collected through some surveys if results are already given a priori through the way subjects are being asked. Survey response patterns seem heavily determined by semantics. Language algorithms may suggest these prior to administering a survey. This study suggests that semantic algorithms are becoming new tools for the social sciences, opening perspectives on survey responses that prevalent psychometric theory cannot explain.

  3. Predicting survey responses: how and why semantics shape survey statistics on organizational behaviour.

    Directory of Open Access Journals (Sweden)

    Jan Ketil Arnulf

    Full Text Available Some disciplines in the social sciences rely heavily on collecting survey responses to detect empirical relationships among variables. We explored whether these relationships were a priori predictable from the semantic properties of the survey items, using language processing algorithms which are now available as new research methods. Language processing algorithms were used to calculate the semantic similarity among all items in state-of-the-art surveys from Organisational Behaviour research. These surveys covered areas such as transformational leadership, work motivation and work outcomes. This information was used to explain and predict the response patterns from real subjects. Semantic algorithms explained 60-86% of the variance in the response patterns and allowed remarkably precise prediction of survey responses from humans, except in a personality test. Even the relationships between independent and their purported dependent variables were accurately predicted. This raises concern about the empirical nature of data collected through some surveys if results are already given a priori through the way subjects are being asked. Survey response patterns seem heavily determined by semantics. Language algorithms may suggest these prior to administering a survey. This study suggests that semantic algorithms are becoming new tools for the social sciences, opening perspectives on survey responses that prevalent psychometric theory cannot explain.

  4. Full-Information Item Bifactor Analysis of Graded Response Data

    Science.gov (United States)

    Gibbons, Robert D.; Bock, R. Darrell; Hedeker, Donald; Weiss, David J.; Segawa, Eisuke; Bhaumik, Dulal K.; Kupfer, David J.; Frank, Ellen; Grochocinski, Victoria J.; Stover, Angela

    2007-01-01

    A plausible factorial structure for many types of psychological and educational tests exhibits a general factor and one or more group or method factors. This structure can be represented by a bifactor model. The bifactor structure results from the constraint that each item has a nonzero loading on the primary dimension and, at most, one of the…

  5. The Role of Psychometric Modeling in Test Validation: An Application of Multidimensional Item Response Theory

    Science.gov (United States)

    Schilling, Stephen G.

    2007-01-01

    In this paper the author examines the role of item response theory (IRT), particularly multidimensional item response theory (MIRT) in test validation from a validity argument perspective. The author provides justification for several structural assumptions and interpretations, taking care to describe the role he believes they should play in any…

  6. A Polytomous Item Response Theory Analysis of Social Physique Anxiety Scale

    Science.gov (United States)

    Fletcher, Richard B.; Crocker, Peter

    2014-01-01

    The present study investigated the social physique anxiety scale's factor structure and item properties using confirmatory factor analysis and item response theory. An additional aim was to identify differences in response patterns between groups (gender). A large sample of high school students aged 11-15 years (N = 1,529) consisting of n =…

  7. Stochastic Approximation Methods for Latent Regression Item Response Models. Research Report. ETS RR-09-09

    Science.gov (United States)

    von Davier, Matthias; Sinharay, Sandip

    2009-01-01

    This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…

  8. Applying Unidimensional and Multidimensional Item Response Theory Models in Testlet-Based Reading Assessment

    Science.gov (United States)

    Min, Shangchao; He, Lianzhen

    2014-01-01

    This study examined the relative effectiveness of the multidimensional bi-factor model and multidimensional testlet response theory (TRT) model in accommodating local dependence in testlet-based reading assessment with both dichotomously and polytomously scored items. The data used were 14,089 test-takers' item-level responses to the testlet-based…

  9. Stochastic order in dichotomous item response models for fixed tests, research adaptive tests, or multiple abilities

    NARCIS (Netherlands)

    van der Linden, Willem J.

    1995-01-01

    Dichotomous item response theory (IRT) models can be viewed as families of stochastically ordered distributions of responses to test items. This paper explores several properties of such distributiom. The focus is on the conditions under which stochastic order in families of conditional distribution

  10. Modelling non-ignorable missing-data mechanisms with item response theory models

    NARCIS (Netherlands)

    Holman, Rebecca; Glas, Cees A.W.

    2005-01-01

    A model-based procedure for assessing the extent to which missing data can be ignored and handling non-ignorable missing data is presented. The procedure is based on item response theory modelling. As an example, the approach is worked out in detail in conjunction with item response data modelled us

  11. Modeling Answer Change Behavior: An Application of a Generalized Item Response Tree Model

    Science.gov (United States)

    Jeon, Minjeong; De Boeck, Paul; van der Linden, Wim

    2017-01-01

    We present a novel application of a generalized item response tree model to investigate test takers' answer change behavior. The model allows us to simultaneously model the observed patterns of the initial and final responses after an answer change as a function of a set of latent traits and item parameters. The proposed application is illustrated…

  12. The Random Response Technique as an Indicator of Questionnaire Item Social Desirability/Personal Sensitivity.

    Science.gov (United States)

    Crino, Michael D.; And Others

    1985-01-01

    The random response technique was compared to a direct questionnaire, administered to college students, to investigate whether or not the responses predicted the social desirability of the item. Results suggest support for the hypothesis. A 33-item version of the Marlowe-Crowne Social Desirability Scale which was used is included. (GDC)

  13. Secondary Psychometric Examination of the Dimensional Obsessive-Compulsive Scale: Classical Testing, Item Response Theory, and Differential Item Functioning.

    Science.gov (United States)

    Thibodeau, Michel A; Leonard, Rachel C; Abramowitz, Jonathan S; Riemann, Bradley C

    2015-12-01

    The Dimensional Obsessive-Compulsive Scale (DOCS) is a promising measure of obsessive-compulsive disorder (OCD) symptoms but has received minimal psychometric attention. We evaluated the utility and reliability of DOCS scores. The study included 832 students and 300 patients with OCD. Confirmatory factor analysis supported the originally proposed four-factor structure. DOCS total and subscale scores exhibited good to excellent internal consistency in both samples (α = .82 to α = .96). Patient DOCS total scores reduced substantially during treatment (t = 16.01, d = 1.02). DOCS total scores discriminated between students and patients (sensitivity = 0.76, 1 - specificity = 0.23). The measure did not exhibit gender-based differential item functioning as tested by Mantel-Haenszel chi-square tests. Expected response options for each item were plotted as a function of item response theory and demonstrated that DOCS scores incrementally discriminate OCD symptoms ranging from low to extremely high severity. Incremental differences in DOCS scores appear to represent unbiased and reliable differences in true OCD symptom severity.

  14. Theoretical and Empirical Comparisons between Two Models for Continuous Item Response.

    Science.gov (United States)

    Ferrando, Pere J

    2002-10-01

    This article analyzes the relations between two continuous response models intended for typical response items: the linear congeneric model and Samejima's continuous response model (CRM). Using a factor analytical (FA) approach based on the assumption of underlying response variables, I describe how a particular case of the CRM can be considered as a nonlinear counterpart of Spearman's FA model. The mathematical relations between the: item-trait regressions, item parameter values, and conditional and marginal distributions of both models are obtained. The results allow (a) the item parameter values of the linear model to be obtained from CRM item parameter values, and (b) the conditions in which the congeneric model will be a good approximation to the CRM to be predicted. The relations described are illustrated using an empirical example and assessed by means of a simulation study.

  15. Using item response theory models to evaluate the Practice Environment Scale.

    Science.gov (United States)

    Raju, Dheeraj; Su, Xiaogang; Patrician, Patricia A

    2014-01-01

    The purpose of this article is to introduce different types of item response theory models and to demonstrate their usefulness by evaluating the Practice Environment Scale. Item response theory models such as constrained and unconstrained graded response model, partial credit model, Rasch model, and one-parameter logistic model are demonstrated. The Akaike information criterion (AIC) and Bayesian information criterion (BIC) indices are used as model selection criterion. The unconstrained graded response and partial credit models indicated the best fit for the data. Almost all items in the instrument performed well. Although most of the items strongly measure the construct, there are a few items that could be eliminated without substantially altering the instrument. The analysis revealed that the instrument may function differently when administered to different unit types.

  16. Analysis test of understanding of vectors with the three-parameter logistic model of item response theory and item response curves technique

    Science.gov (United States)

    Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

    2016-12-01

    This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.

  17. PENGEMBANGAN DAN ANALISIS SOAL ULANGAN KENAIKAN KELAS KIMIA SMA KELAS X BERDASARKAN CLASSICAL TEST THEORY DAN ITEM RESPONSE THEORY

    Directory of Open Access Journals (Sweden)

    Mr Nahadi

    2011-10-01

    Full Text Available This research is title “Test Development and Analysis of First Grade Senior High School Final Examination in chemistry Based on Classical Test Theory and Item Response Theory”. This research is conducted to develop a standard test instrument for final examination in senior high school at first grade using analysis based on classical test theory and item response theory. The test is a multiple choice test which consists of 75 items. Each item has five options. The research method is research and development method to get a product of test items which fulfill item criterion such as validity, reliability, item discrimination, item difficulty and distracting options quality based on classical test theory and validity, reliability, item discrimination, item difficulty and pseudo-guessing based on item response theory. The three parameter item response theory model is used in this research. Research and development method is conducted until preliminary field test to 102 first grade students in senior high school. Based on the research result, the test fulfills criterion as a good instrument based on classical test theory and item response theory. The final examination test items have vary of item quality so that some of them need a revision to make them better either for the stem and the options. From the total of 75 test items, 21 test items are declined and 54 test items are accepted.

  18. Applying Item Response Theory methods to design a learning progression-based science assessment

    Science.gov (United States)

    Chen, Jing

    Learning progressions are used to describe how students' understanding of a topic progresses over time and to classify the progress of students into steps or levels. This study applies Item Response Theory (IRT) based methods to investigate how to design learning progression-based science assessments. The research questions of this study are: (1) how to use items in different formats to classify students into levels on the learning progression, (2) how to design a test to give good information about students' progress through the learning progression of a particular construct and (3) what characteristics of test items support their use for assessing students' levels. Data used for this study were collected from 1500 elementary and secondary school students during 2009--2010. The written assessment was developed in several formats such as the Constructed Response (CR) items, Ordered Multiple Choice (OMC) and Multiple True or False (MTF) items. The followings are the main findings from this study. The OMC, MTF and CR items might measure different components of the construct. A single construct explained most of the variance in students' performances. However, additional dimensions in terms of item format can explain certain amount of the variance in student performance. So additional dimensions need to be considered when we want to capture the differences in students' performances on different types of items targeting the understanding of the same underlying progression. Items in each item format need to be improved in certain ways to classify students more accurately into the learning progression levels. This study establishes some general steps that can be followed to design other learning progression-based tests as well. For example, first, the boundaries between levels on the IRT scale can be defined by using the means of the item thresholds across a set of good items. Second, items in multiple formats can be selected to achieve the information criterion at all

  19. Scale construction and evaluation in practice : A review of factor analysis versus item response theory applications

    NARCIS (Netherlands)

    Ten Holt, J.C.; van Duijn, M.A.J.; Boomsma, A.

    2010-01-01

    In scale construction and evaluation, factor analysis (FA) and item response theory (IRT) are two methods frequently used to determine whether a set of items reliably measures a latent variable. In a review of 41 published studies we examined which methodology – FA or IRT – was used, and what resear

  20. Classification Consistency and Accuracy for Complex Assessments Using Item Response Theory

    Science.gov (United States)

    Lee, Won-Chan

    2010-01-01

    In this article, procedures are described for estimating single-administration classification consistency and accuracy indices for complex assessments using item response theory (IRT). This IRT approach was applied to real test data comprising dichotomous and polytomous items. Several different IRT model combinations were considered. Comparisons…

  1. An NCME Instructional Module on Estimating Item Response Theory Models Using Markov Chain Monte Carlo Methods

    Science.gov (United States)

    Kim, Jee-Seon; Bolt, Daniel M.

    2007-01-01

    The purpose of this ITEMS module is to provide an introduction to Markov chain Monte Carlo (MCMC) estimation for item response models. A brief description of Bayesian inference is followed by an overview of the various facets of MCMC algorithms, including discussion of prior specification, sampling procedures, and methods for evaluating chain…

  2. An NCME Instructional Module on Estimating Item Response Theory Models Using Markov Chain Monte Carlo Methods

    Science.gov (United States)

    Kim, Jee-Seon; Bolt, Daniel M.

    2007-01-01

    The purpose of this ITEMS module is to provide an introduction to Markov chain Monte Carlo (MCMC) estimation for item response models. A brief description of Bayesian inference is followed by an overview of the various facets of MCMC algorithms, including discussion of prior specification, sampling procedures, and methods for evaluating chain…

  3. Practical methods for dealing with 'not applicable' item responses in the AMC Linear Disability Score project

    NARCIS (Netherlands)

    Holman, Rebecca; Glas, Cornelis A.W.; Lindeboom, Robert; Zwinderman, Aeilko H.; de Haan, Rob J.

    2004-01-01

    Background: Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included

  4. Relationships among Classical Test Theory and Item Response Theory Frameworks via Factor Analytic Models

    Science.gov (United States)

    Kohli, Nidhi; Koran, Jennifer; Henn, Lisa

    2015-01-01

    There are well-defined theoretical differences between the classical test theory (CTT) and item response theory (IRT) frameworks. It is understood that in the CTT framework, person and item statistics are test- and sample-dependent. This is not the perception with IRT. For this reason, the IRT framework is considered to be theoretically superior…

  5. Characteristics of Items in the Eysenck Personality Inventory Which Affect Responses When Students Simulate

    Science.gov (United States)

    Power, R. P.; Macrae, K. D.

    1977-01-01

    A large sample of students completed Form A of the Eysenck Personality Inventory, and four subgroups were later asked to simulate extraversion, introversion, neuroticism or stability. It was found that subjects could simulate these four personalities successfully. The changes in individual item responses were correlated with the items' factor…

  6. Examining faking on personality inventories using unfolding item response theory models.

    Science.gov (United States)

    Scherbaum, Charles A; Sabet, Jennifer; Kern, Michael J; Agnello, Paul

    2013-01-01

    A concern about personality inventories in diagnostic and decision-making contexts is that individuals will fake. Although there is extensive research on faking, little research has focused on how perceptions of personality items change when individuals are faking or responding honestly. This research demonstrates how the delta parameter from the generalized graded unfolding item response theory model can be used to examine how individuals' perceptions about personality items might change when responding honestly or when faking. The results indicate that perceptions changed from honest to faking conditions for several neuroticism items. The direction of the change varied, indicating that faking can operate to increase or decrease scores within a personality factor.

  7. Asymptotic Properties of Induced Maximum Likelihood Estimates of Nonlinear Models for Item Response Variables: The Finite-Generic-Item-Pool Case.

    Science.gov (United States)

    Jones, Douglas H.

    The progress of modern mental test theory depends very much on the techniques of maximum likelihood estimation, and many popular applications make use of likelihoods induced by logistic item response models. While, in reality, item responses are nonreplicate within a single examinee and the logistic models are only ideal, practitioners make…

  8. A model of hippocampal spiking responses to items during learning of a context-dependent task

    Directory of Open Access Journals (Sweden)

    Florian eRaudies

    2014-09-01

    Full Text Available Single unit recordings in the rat hippocampus have demonstrated shifts in the specificity of spiking activity during learning of a contextual item-reward association task. In this task, rats received reward for responding to different items dependent upon the context an item appeared in, but not dependent upon the location an item appears at. Initially, neurons in the rat hippocampus primarily show firing based on place, but as the rat learns the task this firing became more selective for items. We simulated this effect using a simple circuit model with discrete inputs driving spiking activity representing place and item followed sequentially by a discrete representation of the motor actions involving a response to an item (digging for food or the movement to a different item (movement to a different pot for food. We implemented spiking replay in the network representing neural activity observed during sharp-wave ripple events, and modified synaptic connections based on a simple representation of spike-timing dependent synaptic plasticity. This simple network was able to consistently learn the context-dependent responses, and transitioned from dominant coding of place to a gradual increase in specificity to items consistent with analysis of the experimental data. In addition, the model showed an increase in specificity toward context. The increase of selectivity in the model is accompanied by an increase in binariness of the synaptic weights for cells that are part of the functional network.

  9. Bayesian item response theory models for measurement variance

    NARCIS (Netherlands)

    Verhagen, A.J.

    2012-01-01

    Tests, surveys and questionnaires are all around us these days, and there is an increasing interest in comparing the resulting scores: between countries, between males and females, or over measurement occasions. In the design and analysis of such measurement instruments, a major concern is that the

  10. Attending multiple items decreases the selectivity of population responses in human primary visual cortex.

    Science.gov (United States)

    Anderson, David E; Ester, Edward F; Serences, John T; Awh, Edward

    2013-05-29

    Multiple studies have documented an inverse relationship between the number of to-be-attended or remembered items in a display ("set size") and task performance. The neural source of this decline in cognitive performance is currently under debate. Here, we used a combination of fMRI and a forward encoding model of orientation selectivity to generate population tuning functions for each of two stimuli while human observers attended either one or both items. We observed (1) clear population tuning functions for the attended item(s) that peaked at the stimulus orientation and decreased monotonically as the angular distance from this orientation increased, (2) a set-size-dependent decline in the relative precision of orientation-specific population responses, such that attending two items yielded a decline in selectivity of the population tuning function for each item, and (3) that the magnitude of the loss of precision in population tuning functions predicted individual differences in the behavioral cost of attending an additional item. These findings demonstrate that attending multiple items degrades the precision of perceptual representations for the target items and provides a straightforward account for the associated impairments in visually guided behavior.

  11. Fighting bias with statistics: Detecting gender differences in responses to items on a preschool science assessment

    Science.gov (United States)

    Greenberg, Ariela Caren

    Differential item functioning (DIF) and differential distractor functioning (DDF) are methods used to screen for item bias (Camilli & Shepard, 1994; Penfield, 2008). Using an applied empirical example, this mixed-methods study examined the congruency and relationship of DIF and DDF methods in screening multiple-choice items. Data for Study I were drawn from item responses of 271 female and 236 male low-income children on a preschool science assessment. Item analyses employed a common statistical approach of the Mantel-Haenszel log-odds ratio (MH-LOR) to detect DIF in dichotomously scored items (Holland & Thayer, 1988), and extended the approach to identify DDF (Penfield, 2008). Findings demonstrated that the using MH-LOR to detect DIF and DDF supported the theoretical relationship that the magnitude and form of DIF and are dependent on the DDF effects, and demonstrated the advantages of studying DIF and DDF in multiple-choice items. A total of 4 items with DIF and DDF and 5 items with only DDF were detected. Study II incorporated an item content review, an important but often overlooked and under-published step of DIF and DDF studies (Camilli & Shepard). Interviews with 25 female and 22 male low-income preschool children and an expert review helped to interpret the DIF and DDF results and their comparison, and determined that a content review process of studied items can reveal reasons for potential item bias that are often congruent with the statistical results. Patterns emerged and are discussed in detail. The quantitative and qualitative analyses were conducted in an applied framework of examining the validity of the preschool science assessment scores for evaluating science programs serving low-income children, however, the techniques can be generalized for use with measures across various disciplines of research.

  12. Pain and distress caused by endotracheal suctioning in neonates is better quantified by behavioural than physiological items: a comparison based on item response theory modelling.

    Science.gov (United States)

    Välitalo, Pyry A J; van Dijk, Monique; Krekels, Elke H J; Gibbins, Sharyn; Simons, Sinno H P; Tibboel, Dick; Knibbe, Catherijne A J

    2016-08-01

    Pain cannot be directly measured in neonates. Therefore, scores based on indirect behavioural signals such as crying, or physiological signs such as blood pressure, are used to quantify neonatal pain both in clinical practice and in clinical studies. The aim of this study was to determine which of the physiological and behavioural items of 2 validated pain assessment scales (COMFORT and premature infant pain profile) are best able to detect pain during endotracheal and nasal suctioning in ventilated newborns. We analysed a total of 516 PIPP and COMFORT scores from 118 newborns. A graded response model was built to describe the data and item information was calculated for each of the behavioural and physiological items. We found that the graded response model was able to well describe the data, as judged by agreement between the observed data and model simulations. Furthermore, a good agreement was found between the pain estimated by the graded response model and the investigator-assessed visual analogue scale scores (Spearman rho correlation coefficient = 0.80). The information scores for the behavioural items ranged from 1.4 to 27.2 and from 0.0282 to 0.131 for physiological items. In these data with mild to moderate pain levels, behavioural items were vastly more informative of pain and distress than were physiological items. The items that were the most informative of pain are COMFORT items "calmness/agitation," "alertness," and "facial tension."

  13. Numerical Differentiation Methods for Computing Error Covariance Matrices in Item Response Theory Modeling: An Evaluation and a New Proposal

    Science.gov (United States)

    Tian, Wei; Cai, Li; Thissen, David; Xin, Tao

    2013-01-01

    In item response theory (IRT) modeling, the item parameter error covariance matrix plays a critical role in statistical inference procedures. When item parameters are estimated using the EM algorithm, the parameter error covariance matrix is not an automatic by-product of item calibration. Cai proposed the use of Supplemented EM algorithm for…

  14. Latent Variable Modelling and Item Response Theory Analyses in Marketing Research

    Directory of Open Access Journals (Sweden)

    Brzezińska Justyna

    2016-12-01

    Full Text Available Item Response Theory (IRT is a modern statistical method using latent variables designed to model the interaction between a subject’s ability and the item level stimuli (difficulty, guessing. Item responses are treated as the outcome (dependent variables, and the examinee’s ability and the items’ characteristics are the latent predictor (independent variables. IRT models the relationship between a respondent’s trait (ability, attitude and the pattern of item responses. Thus, the estimation of individual latent traits can differ even for two individuals with the same total scores. IRT scores can yield additional benefits and this will be discussed in detail. In this paper theory and application with R software with the use of packages designed for modelling IRT will be presented.

  15. Measuring the quality of life in hypertension according to Item Response Theory

    Directory of Open Access Journals (Sweden)

    José Wicto Pereira Borges

    Full Text Available ABSTRACT OBJECTIVE To analyze the Miniquestionário de Qualidade de Vida em Hipertensão Arterial (MINICHAL – Mini-questionnaire of Quality of Life in Hypertension using the Item Response Theory. METHODS This is an analytical study conducted with 712 persons with hypertension treated in thirteen primary health care units of Fortaleza, State of Ceará, Brazil, in 2015. The steps of the analysis by the Item Response Theory were: evaluation of dimensionality, estimation of parameters of items, and construction of scale. The study of dimensionality was carried out on the polychoric correlation matrix and confirmatory factor analysis. To estimate the item parameters, we used the Gradual Response Model of Samejima. The analyses were conducted using the free software R with the aid of psych and mirt. RESULTS The analysis has allowed the visualization of item parameters and their individual contributions in the measurement of the latent trait, generating more information and allowing the construction of a scale with an interpretative model that demonstrates the evolution of the worsening of the quality of life in five levels. Regarding the item parameters, the items related to the somatic state have had a good performance, as they have presented better power to discriminate individuals with worse quality of life. The items related to mental state have been those which contributed with less psychometric data in the MINICHAL. CONCLUSIONS We conclude that the instrument is suitable for the identification of the worsening of the quality of life in hypertension. The analysis of the MINICHAL using the Item Response Theory has allowed us to identify new sides of this instrument that have not yet been addressed in previous studies.

  16. Modeling and Testing Differential Item Functioning in Unidimensional Binary Item Response Models with a Single Continuous Covariate: A Functional Data Analysis Approach.

    Science.gov (United States)

    Liu, Yang; Magnus, Brooke E; Thissen, David

    2016-06-01

    Differential item functioning (DIF), referring to between-group variation in item characteristics above and beyond the group-level disparity in the latent variable of interest, has long been regarded as an important item-level diagnostic. The presence of DIF impairs the fit of the single-group item response model being used, and calls for either model modification or item deletion in practice, depending on the mode of analysis. Methods for testing DIF with continuous covariates, rather than categorical grouping variables, have been developed; however, they are restrictive in parametric forms, and thus are not sufficiently flexible to describe complex interaction among latent variables and covariates. In the current study, we formulate the probability of endorsing each test item as a general bivariate function of a unidimensional latent trait and a single covariate, which is then approximated by a two-dimensional smoothing spline. The accuracy and precision of the proposed procedure is evaluated via Monte Carlo simulations. If anchor items are available, we proposed an extended model that simultaneously estimates item characteristic functions (ICFs) for anchor items, ICFs conditional on the covariate for non-anchor items, and the latent variable density conditional on the covariate-all using regression splines. A permutation DIF test is developed, and its performance is compared to the conventional parametric approach in a simulation study. We also illustrate the proposed semiparametric DIF testing procedure with an empirical example.

  17. Test-retest reliability of selected items of Health Behaviour in School-aged Children (HBSC survey questionnaire in Beijing, China

    Directory of Open Access Journals (Sweden)

    Liu Yang

    2010-08-01

    Full Text Available Abstract Background Children's health and health behaviour are essential for their development and it is important to obtain abundant and accurate information to understand young people's health and health behaviour. The Health Behaviour in School-aged Children (HBSC study is among the first large-scale international surveys on adolescent health through self-report questionnaires. So far, more than 40 countries in Europe and North America have been involved in the HBSC study. The purpose of this study is to assess the test-retest reliability of selected items in the Chinese version of the HBSC survey questionnaire in a sample of adolescents in Beijing, China. Methods A sample of 95 male and female students aged 11 or 15 years old participated in a test and retest with a three weeks interval. Student Identity numbers of respondents were utilized to permit matching of test-retest questionnaires. 23 items concerning physical activity, sedentary behaviour, sleep and substance use were evaluated by using the percentage of response shifts and the single measure Intraclass Correlation Coefficients (ICC with 95% confidence interval (CI for all respondents and stratified by gender and age. Items on substance use were only evaluated for school children aged 15 years old. Results The percentage of no response shift between test and retest varied from 32% for the item on computer use at weekends to 92% for the three items on smoking. Of all the 23 items evaluated, 6 items (26% showed a moderate reliability, 12 items (52% displayed a substantial reliability and 4 items (17% indicated almost perfect reliability. No gender and age group difference of the test-retest reliability was found except for a few items on sedentary behaviour. Conclusions The overall findings of this study suggest that most selected indicators in the HBSC survey questionnaire have satisfactory test-retest reliability for the students in Beijing. Further test-retest studies in a large

  18. Item Response Modeling of Paired Comparison and Ranking Data

    Science.gov (United States)

    Maydeu-Olivares, Alberto; Brown, Anna

    2010-01-01

    The comparative format used in ranking and paired comparisons tasks can significantly reduce the impact of uniform response biases typically associated with rating scales. Thurstone's (1927, 1931) model provides a powerful framework for modeling comparative data such as paired comparisons and rankings. Although Thurstonian models are generally…

  19. Information-Processing on Intelligence Test Items: Some Response Components

    Science.gov (United States)

    Whitely, Susan E.

    1977-01-01

    A factor analysis was used to study the relationships among response time and accuracy scores for a verbal analogies test, as well as a number of experimental variables designed to measure a series of information processing stages of the analogies task. (CTM)

  20. College Student Responses to Web and Paper Surveys: Does Mode Matter?

    Science.gov (United States)

    Carini, Robert M.; Hayek, John C.; Kuh, George D.; Kennedy, John M.; Ouimet, Judith A.

    2003-01-01

    Examined the responses of 58,288 college students to 8 scales involving 53 items from the National Survey of Student Engagement to gauge whether individuals respond differently to surveys administered via the Web and paper. Found that mode effects were generally small; however, students who completed the Web-based survey responded more favorably…

  1. Item response theory analyses of the Cambridge Face Memory Test (CFMT).

    Science.gov (United States)

    Cho, Sun-Joo; Wilmer, Jeremy; Herzmann, Grit; McGugin, Rankin Williams; Fiset, Daniel; Van Gulick, Ana E; Ryan, Kaitlin F; Gauthier, Isabel

    2015-06-01

    We evaluated the psychometric properties of the Cambridge Face Memory Test (CFMT; Duchaine & Nakayama, 2006). First, we assessed the dimensionality of the test with a bifactor exploratory factor analysis (EFA). This EFA analysis revealed a general factor and 3 specific factors clustered by targets of CFMT. However, the 3 specific factors appeared to be minor factors that can be ignored. Second, we fit a unidimensional item response model. This item response model showed that the CFMT items could discriminate individuals at different ability levels and covered a wide range of the ability continuum. We found the CFMT to be particularly precise for a wide range of ability levels. Third, we implemented item response theory (IRT) differential item functioning (DIF) analyses for each gender group and 2 age groups (age ≤ 20 vs. age > 21). This DIF analysis suggested little evidence of consequential differential functioning on the CFMT for these groups, supporting the use of the test to compare older to younger, or male to female, individuals. Fourth, we tested for a gender difference on the latent facial recognition ability with an explanatory item response model. We found a significant but small gender difference on the latent ability for face recognition, which was higher for women than men by 0.184, at age mean 23.2, controlling for linear and quadratic age effects. Finally, we discuss the practical considerations of the use of total scores versus IRT scale scores in applications of the CFMT.

  2. Initial nonresponse and survey response mode biases in survey research.

    Science.gov (United States)

    Chi, Donald L; Chen, Chao Ying

    2015-01-01

    We evaluated survey response factors (particularly initial nonresponse and survey mode) that may be associated with bias in survey research. We examined prevention-related beliefs and outcomes for initial mail survey responders (n=209), follow-up mail survey responders (n=78), and follow-up telephone survey responders (n=74). The Pearson chi-square test and analysis of variance identified beliefs and behavioral outcomes associated with survey response mode. Follow-up options to the initial mail survey improved response rates (22.0-38.0 percent). Initial mail survey responders more strongly believed topical fluoride protects teeth from cavities than others (P=0.04). A significantly larger proportion of parents completing a follow-up telephone survey (30.8 percent) refused topical fluoride for their child than those completing mail surveys (10.3-10.4 percent) (Psurveys with follow-up improve response rates. Initial nonresponse and survey response mode may be associated with biases in survey research. © 2015 American Association of Public Health Dentistry.

  3. A comparison of item response theory-based methods for examining differential item functioning in object naming test by language of assessment among older Latinos

    Directory of Open Access Journals (Sweden)

    Frances M. Yang

    2011-12-01

    Full Text Available Object naming tests are commonly included in neuropsychological test batteries. Differential item functioning (DIF in these tests due to cultural and language differences may compromise the validity of cognitive measures in diverse populations. We evaluated 26 object naming items for DIF due to Spanish and English language translations among Latinos (n=1,159, mean age of 70.5 years old (Standard Deviation (SD±7.2, using the following four item response theory-based ap-proaches: Mplus/Multiple Indicator, Multiple Causes (Mplus/MIMIC; Muthén & Muthén, 1998-2011, Item Response Theory Likelihood Ratio Differential Item Functioning (IRTLRDIF/MULTILOG; Thissen, 1991, 2001, difwithpar/Parscale (Crane, Gibbons, Jolley, & van Belle, 2006; Muraki & Bock, 2003, and Differential Functioning of Items and Tests/MULTILOG (DFIT/MULTILOG; Flowers, Oshima, & Raju, 1999; Thissen, 1991. Overall, there was moderate to near perfect agreement across methods. Fourteen items were found to exhibit DIF and 5 items observed consistently across all methods, which were more likely to be answered correctly by individuals tested in Spanish after controlling for overall ability.

  4. A comparison of item response theory-based methods for examining differential item functioning in object naming test by language of assessment among older Latinos.

    Science.gov (United States)

    Yang, Frances M; Heslin, Kevin C; Mehta, Kala M; Yang, Cheng-Wu; Ocepek-Welikson, Katja; Kleinman, Marjorie; Morales, Leo S; Hays, Ron D; Stewart, Anita L; Mungas, Dan; Jones, Richard N; Teresi, Jeanne A

    2011-01-01

    Object naming tests are commonly included in neuropsychological test batteries. Differential item functioning (DIF) in these tests due to cultural and language differences may compromise the validity of cognitive measures in diverse populations. We evaluated 26 object naming items for DIF due to Spanish and English language translations among Latinos (n=1,159), mean age of 70.5 years old (Standard Deviation (SD)±7.2), using the following four item response theory-based approaches: Mplus/Multiple Indicator, Multiple Causes (Mplus/MIMIC; Muthén & Muthén, 1998-2011), Item Response Theory Likelihood Ratio Differential Item Functioning (IRTLRDIF/MULTILOG; Thissen, 1991, 2001), difwithpar/Parscale (Crane, Gibbons, Jolley, & van Belle, 2006; Muraki & Bock, 2003), and Differential Functioning of Items and Tests/MULTILOG (DFIT/MULTILOG; Flowers, Oshima, & Raju, 1999; Thissen, 1991). Overall, there was moderate to near perfect agreement across methods. Fourteen items were found to exhibit DIF and 5 items observed consistently across all methods, which were more likely to be answered correctly by individuals tested in Spanish after controlling for overall ability.

  5. Power analysis in randomized clinical trials based on item response theory

    NARCIS (Netherlands)

    Holman, Rebecca; Glas, Cees A.W.; Haan, de Rob J.

    2003-01-01

    Patient relevant outcomes, measured using questionnaires, are becoming increasingly popular endpoints in randomized clinical trials (RCTs). Recently, interest in the use of item response theory (IRT) to analyze the responses to such questionnaires has increased. In this paper, we used a simulation s

  6. Revision of the ICIDH Severity of Disabilities Scale by data linking and item response theory

    NARCIS (Netherlands)

    Buuren, S. van; Hopman-Rock, M.

    2001-01-01

    The Severity of Disabilities Scale (SDS) of the ICIDH reflects the degree to which an individual's ability to perform a certain activity is restricted. This paper describes the application of two models from item response theory (IRT), the graded response model and the partial credit model, in order

  7. Confidence Bands for the Three-Parameter Logistic Item Response Curve.

    Science.gov (United States)

    Lord, Frederic M.; Pashley, Peter J.

    A large sample method for obtaining asymptotic simultaneous confidence bands for a three-parameter logistic response curve is described. Simultaneous confidence bands indicate the sampling variation of item response curves relative to a fitted function. A procedure is given which requires as input maximum likelihood parameter estimates and an…

  8. Fitting Item Response Theory Models to Two Personality Inventories: Issues and Insights.

    Science.gov (United States)

    Chernyshenko, Oleksandr S.; Stark, Stephen; Chan, Kim-Yin; Drasgow, Fritz; Williams, Bruce

    2001-01-01

    Compared the fit of several Item Response Theory (IRT) models to two personality assessment instruments using data from 13,059 individuals responding to one instrument and 1,770 individuals responding to the other. Two- and three-parameter logistic models fit some scales reasonably well, but not others, and the graded response model generally did…

  9. Response rate, response time, and economic costs of survey research: A randomized trial of practicing pharmacists.

    Science.gov (United States)

    Hardigan, Patrick C; Popovici, Ioana; Carvajal, Manuel J

    2016-01-01

    There is a gap between increasing demands from pharmacy journals, publishers, and reviewers for high survey response rates and the actual responses often obtained in the field by survey researchers. Presumably demands have been set high because response rates, times, and costs affect the validity and reliability of survey results. Explore the extent to which survey response rates, average response times, and economic costs are affected by conditions under which pharmacist workforce surveys are administered. A random sample of 7200 U.S. practicing pharmacists was selected. The sample was stratified by delivery method, questionnaire length, item placement, and gender of respondent for a total of 300 observations within each subgroup. A job satisfaction survey was administered during March-April 2012. Delivery method was the only classification showing significant differences in response rates and average response times. The postal mail procedure accounted for the highest response rates of completed surveys, but the email method exhibited the quickest turnaround. A hybrid approach, consisting of a combination of postal and electronic means, showed the least favorable results. Postal mail was 2.9 times more cost effective than the email approach and 4.6 times more cost effective than the hybrid approach. Researchers seeking to increase practicing pharmacists' survey participation and reduce response time and related costs can benefit from the analytical procedures tested here. Copyright © 2016 Elsevier Inc. All rights reserved.

  10. Explanatory multidimensional multilevel random item response model: an application to simultaneous investigation of word and person contributions to multidimensional lexical representations.

    Science.gov (United States)

    Cho, Sun-Joo; Gilbert, Jennifer K; Goodwin, Amanda P

    2013-10-01

    This paper presents an explanatory multidimensional multilevel random item response model and its application to reading data with multilevel item structure. The model includes multilevel random item parameters that allow consideration of variability in item parameters at both item and item group levels. Item-level random item parameters were included to model unexplained variance remaining when item related covariates were used to explain variation in item difficulties. Item group-level random item parameters were included to model dependency in item responses among items having the same item stem. Using the model, this study examined the dimensionality of a person's word knowledge, termed lexical representation, and how aspects of morphological knowledge contributed to lexical representations for different persons, items, and item groups.

  11. The academic medical center linear disability score (ALDS) item bank: item response theory analysis in a mixed patient population

    NARCIS (Netherlands)

    Holman, Rebecca; Weisscher, Nadine; Glas, Cornelis A.W.; Dijkgraaf, Marcel G.W.; Vermeulen, Martinus; de Haan, Rob J.; Lindeboom, Robert

    2005-01-01

    Background: Currently, there is a lot of interest in the flexible framework offered by item banks for measuring patient relevant outcomes. However, there are few item banks, which have been developed to quantify functional status, as expressed by the ability to perform activities of daily life. This

  12. Measuring organizational effectiveness in information and communication technology companies using item response theory.

    Science.gov (United States)

    Trierweiller, Andréa Cristina; Peixe, Blênio César Severo; Tezza, Rafael; Pereira, Vera Lúcia Duarte do Valle; Pacheco, Waldemar; Bornia, Antonio Cezar; de Andrade, Dalton Francisco

    2012-01-01

    The aim of this paper is to measure the effectiveness of the organizations Information and Communication Technology (ICT) from the point of view of the manager, using Item Response Theory (IRT). There is a need to verify the effectiveness of these organizations which are normally associated to complex, dynamic, and competitive environments. In academic literature, there is disagreement surrounding the concept of organizational effectiveness and its measurement. A construct was elaborated based on dimensions of effectiveness towards the construction of the items of the questionnaire which submitted to specialists for evaluation. It demonstrated itself to be viable in measuring organizational effectiveness of ICT companies under the point of view of a manager through using Two-Parameter Logistic Model (2PLM) of the IRT. This modeling permits us to evaluate the quality and property of each item placed within a single scale: items and respondents, which is not possible when using other similar tools.

  13. Scale construction and evaluation in practice: A review of factor analysis versus item response theory applications

    Directory of Open Access Journals (Sweden)

    Anne Boomsma

    2010-09-01

    Full Text Available In scale construction and evaluation, factor analysis (FA and item response theory (IRT are two methods frequently used to determine whether a set of items reliably measures a latent variable. In a review of 41 published studies we examined which methodology – FA or IRT – was used, and what researchers’ motivations were for applying either method. Characteristics of the studies were compared to gain more insight into the practice of scale analysis. Findings indicate that FA is applied far more often than IRT. Many times it is unclear whether the data justify the chosen method because model assumptions are neglected. We recommended that researchers (a use substantive knowledge about the items to their advantage by more frequently employing confirmatory techniques, as well as adding item content and interpretability of factors to the criteria in model evaluation; and (b investigate model assumptions and report corresponding findings. To this end, we recommend more collaboration between substantive researchers and statisticians/psychometricians.

  14. Integrating competing dimensional models of personality: linking the SNAP, TCI, and NEO using Item Response Theory.

    Science.gov (United States)

    Stepp, Stephanie D; Yu, Lan; Miller, Joshua D; Hallquist, Michael N; Trull, Timothy J; Pilkonis, Paul A

    2012-04-01

    Mounting evidence suggests that several inventories assessing both normal personality and personality disorders measure common dimensional personality traits (i.e., Antagonism, Constraint, Emotional Instability, Extraversion, and Unconventionality), albeit providing unique information along the underlying trait continuum. We used Widiger and Simonsen's (2005) pantheoretical integrative model of dimensional personality assessment as a guide to create item pools. We then used Item Response Theory (IRT) to compare the assessment of these five personality traits across three established dimensional measures of personality: the Schedule for Nonadaptive and Adaptive Personality (SNAP), the Temperament and Character Inventory (TCI), and the Revised NEO Personality Inventory (NEO PI-R). We found that items from each inventory map onto these five common personality traits in predictable ways. The IRT analyses, however, documented considerable variability in the item and test information derived from each inventory. Our findings support the notion that the integration of multiple perspectives will provide greater information about personality while minimizing the weaknesses of any single instrument.

  15. Mokken scale analysis of mental health and well-being questionnaire item responses: a non-parametric IRT method in empirical research for applied health researchers

    Directory of Open Access Journals (Sweden)

    Stochl Jan

    2012-06-01

    Full Text Available Abstract Background Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Methods Scalability of data from 1 a cross-sectional health survey (the Scottish Health Education Population Survey and 2 a general population birth cohort study (the National Child Development Study illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. Results and conclusions After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items we show that all items from the 12-item General Health Questionnaire (GHQ-12 – when binary scored – were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech’s “well-being” and “distress” clinical scales. An illustration of ordinal item analysis

  16. Mokken scale analysis of mental health and well-being questionnaire item responses: a non-parametric IRT method in empirical research for applied health researchers.

    Science.gov (United States)

    Stochl, Jan; Jones, Peter B; Croudace, Tim J

    2012-06-11

    Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related) Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Scalability of data from 1) a cross-sectional health survey (the Scottish Health Education Population Survey) and 2) a general population birth cohort study (the National Child Development Study) illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items) we show that all items from the 12-item General Health Questionnaire (GHQ-12)--when binary scored--were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech's "well-being" and "distress" clinical scales). An illustration of ordinal item analysis confirmed that all 14 positively worded items of the Warwick-Edinburgh Mental

  17. mirt: A Multidimensional Item Response Theory Package for the R Environment

    Directory of Open Access Journals (Sweden)

    R. Philip Chalmers

    2012-05-01

    Full Text Available Item response theory (IRT is widely used in assessment and evaluation research to explain how participants respond to item level stimuli. Several R packages can be used to estimate the parameters in various IRT models, the most flexible being the ltm (Rizopoulos 2006, eRm (Mair and Hatzinger 2007, and MCMCpack (Martin, Quinn, and Park 2011 packages. However these packages have limitations in that ltm and eRm can only analyze unidimensional IRT models effectively and the exploratory multidimensional extensions available in MCMCpack requires prior understanding of Bayesian estimation convergence diagnostics and are computationally intensive. Most importantly, multidimensional confirmatory item factor analysis methods have not been implemented in any R package.The mirt package was created for estimating multidimensional item response theory parameters for exploratory and confirmatory models by using maximum-likelihood meth- ods. The Gauss-Hermite quadrature method used in traditional EM estimation (e.g., Bock and Aitkin 1981 is presented for exploratory item response models as well as for confirmatory bifactor models (Gibbons and Hedeker 1992. Exploratory and confirmatory models are estimated by a stochastic algorithm described by Cai (2010a,b. Various program comparisons are presented and future directions for the package are discussed.

  18. Disability Items From the Current Population Survey (2008-2015) and Permanent Versus Temporary Disability Status.

    Science.gov (United States)

    Ward, Bryce; Myers, Andrew; Wong, Jennifer; Ravesloot, Craig

    2017-05-01

    To examine longitudinal responses to the disability indicator questions that have been adopted as the standard across national surveys sponsored by the US Department of Health and Human Services. Data from the Current Population Survey between 2008 and 2015 were linked to create a longitudinal sample of 721 178 individual respondents. Responses to the disability questions fluctuated significantly. Although 17% of all respondents reported a disability at some point, only 3% consistently reported the same set of disabilities. Demographic differences were found between people who always reported a consistent set of disabilities and those whose responses fluctuated. The disability questions capture 2 discrete groups: people who experience a permanent disability and those who experience a temporary disability. Demographic differences between these groups suggest that this is not simply due to measurement error.

  19. Predicting gender differences as latent variables: summed scores, and individual item responses: a methods case study

    Directory of Open Access Journals (Sweden)

    Jacobs Danny O

    2004-10-01

    Full Text Available Abstract Background Modeling latent variables such as physical disability is challenging since its measurement is performed through proxies. This poses significant methodological challenges. The objective of this article is to present three different methods to predict latent variables based on classical summed scores, individual item responses, and latent variable models. Methods This is a review of the literature and data analysis using "layers of information". Data was collected from the North Carolina Back Pain Project, using a modified version of the Roland Questionnaire. Results The three models are compared in relation to their goals and underlying concepts, previous clinical applications, data requirements, statistical theory, and practical applications. Initial linear regression models demonstrated a difference in disability between genders of 1.32 points (95% CI 0.65, 2.00 on a scale from 0–23. Subsequent item analysis found contradictory results across items, with no clear pattern. Finally, IRT models demonstrated three items were demonstrated to present differential item functioning. After these items were removed, the difference between genders was reduced to 0.78 points (95% CI, -0.99, 1.23. These results were shown to be robust with re-sampling methods. Conclusions Purported differences in the levels of a latent variable should be tested using different models to verify whether these differences are real or simply distorted by model assumptions.

  20. Are vocabulary tests measurement invariant between age groups? An item response analysis of three popular tests.

    Science.gov (United States)

    Fox, Mark C; Berry, Jane M; Freeman, Sara P

    2014-12-01

    Relatively high vocabulary scores of older adults are generally interpreted as evidence that older adults possess more of a common ability than younger adults. Yet, this interpretation rests on empirical assumptions about the uniformity of item-response functions between groups. In this article, we test item response models of differential responding against datasets containing younger-, middle-aged-, and older-adult responses to three popular vocabulary tests (the Shipley, Ekstrom, and WAIS-R) to determine whether members of different age groups who achieve the same scores have the same probability of responding in the same categories (e.g., correct vs. incorrect) under the same conditions. Contrary to the null hypothesis of measurement invariance, datasets for all three tests exhibit substantial differential responding. Members of different age groups who achieve the same overall scores exhibit differing response probabilities in relation to the same items (differential item functioning) and appear to approach the tests in qualitatively different ways that generalize across items. Specifically, younger adults are more likely than older adults to leave items unanswered for partial credit on the Ekstrom, and to produce 2-point definitions on the WAIS-R. Yet, older adults score higher than younger adults, consistent with most reports of vocabulary outcomes in the cognitive aging literature. In light of these findings, the most generalizable conclusion to be drawn from the cognitive aging literature on vocabulary tests is simply that older adults tend to score higher than younger adults, and not that older adults possess more of a common ability.

  1. A preference-based measure of health: the VR-6D derived from the veterans RAND 12-Item Health Survey.

    Science.gov (United States)

    Selim, Alfredo J; Rogers, William; Qian, Shirley X; Brazier, John; Kazis, Lewis E

    2011-10-01

    The Veterans RAND 12-Item Health Survey (VR-12) is currently the major endpoint used in the Medicare managed care outcomes measure in the Healthcare Effectiveness Data and Information Set (HEDIS(®)), referred to as the Health Outcomes Survey (HOS). The purpose of this study is to adapt the Brazier SF-6D utility measure to the VR-12 to generate a single utility index. We used the HOS cohorts 2 and 3 for SF-36 data and 9 for VR-12 data. We calculated SF-6D scores from the SF-36 using the algorithms developed by Brazier and colleagues. The values of the Brazier SF-6D were used to estimate utility scores from the VR-12 using a mapping approach based on a 2-stage mapping procedure, named as VR-6D. The VR-6D derived from the VR-12 has similar distributional properties as the SF-6D. The change in VR-6D showed significant variations across disease groups with different levels of morbidity and mortality. This study produced a utility measure for the VR-12 that is comparable to the SF-6D and responsive to change. The VR-6D can be used in evaluations of health care plans and cost-effectiveness analysis to compare the health gains that health care interventions can achieve.

  2. Target Rotations and Assessing the Impact of Model Violations on the Parameters of Unidimensional Item Response Theory Models

    Science.gov (United States)

    Reise, Steven; Moore, Tyler; Maydeu-Olivares, Alberto

    2011-01-01

    Reise, Cook, and Moore proposed a "comparison modeling" approach to assess the distortion in item parameter estimates when a unidimensional item response theory (IRT) model is imposed on multidimensional data. Central to their approach is the comparison of item slope parameter estimates from a unidimensional IRT model (a restricted model), with…

  3. Development and Standardization of the Diagnostic Adaptive Behavior Scale: Application of Item Response Theory to the Assessment of Adaptive Behavior

    Science.gov (United States)

    Tassé, Marc J.; Schalock, Robert L.; Thissen, David; Balboni, Giulia; Bersani, Henry, Jr.; Borthwick-Duffy, Sharon A.; Spreat, Scott; Widaman, Keith F.; Zhang, Dalun; Navas, Patricia

    2016-01-01

    The Diagnostic Adaptive Behavior Scale (DABS) was developed using item response theory (IRT) methods and was constructed to provide the most precise and valid adaptive behavior information at or near the cutoff point of making a decision regarding a diagnosis of intellectual disability. The DABS initial item pool consisted of 260 items. Using IRT…

  4. Effect of survey mode on response patterns

    DEFF Research Database (Denmark)

    Christensen, Anne Illemann; Ekholm, Ola; Glümer, Charlotte

    2014-01-01

    BACKGROUND: While face-to-face interviews are considered the gold standard of survey modes, self-administered questionnaires are often preferred for cost and convenience. This article examines response patterns in two general population health surveys carried out by face-to-face interview and self......-administered questionnaire, respectively. METHOD: Data derives from a health interview survey in the Region of Southern Denmark (face-to-face interview) and The Danish Health and Morbidity Survey 2010 (self-administered questionnaire). Identical questions were used in both surveys. Data on all individuals were obtained from...... administrative registers and linked to survey data at individual level. Multiple logistic regression analyses were used to examine the effect of survey mode on response patterns. RESULTS: The non-response rate was higher in the self-administered survey (37.9%) than in the face-to-face interview survey (23...

  5. The challenges of fitting an item response theory model to the Social Anhedonia Scale.

    Science.gov (United States)

    Reise, Steven P; Horan, William P; Blanchard, Jack J

    2011-05-01

    This study explored the application of latent variable measurement models to the Social Anhedonia Scale (SAS; Eckblad, Chapman, Chapman, & Mishlove, 1982), a widely used and influential measure in schizophrenia-related research. Specifically, we applied unidimensional and bifactor item response theory (IRT) models to data from a community sample of young adults (n = 2,227). Ordinal factor analyses revealed that identifying a coherent latent structure in the 40-item SAS data was challenging due to (a) the presence of multiple small content clusters (e.g., doublets); (b) modest relations between those clusters, which, in turn, implies a general factor of only modest strength; (c) items that shared little variance with the majority of items; and (d) cross-loadings in bifactor solutions. Consequently, we conclude that SAS responses cannot be modeled accurately by either unidimensional or bifactor IRT models. Although the application of a bifactor model to a reduced 17-item set met with better success, significant psychometric and substantive problems remained. Results highlight the challenges of applying latent variable models to scales that were not originally designed to fit these models.

  6. Development of the Knee Quality of Life (KQoL-26 26-item questionnaire: data quality, reliability, validity and responsiveness

    Directory of Open Access Journals (Sweden)

    Atwell Chris

    2008-07-01

    Full Text Available Abstract Background This article describes the development and validation of a self-reported questionnaire, the KQoL-26, that is based on the views of patients with a suspected ligamentous or meniscal injury of the knee that assesses the impact of their knee problem on the quality of their lives. Methods Patient interviews and focus groups were used to derive questionnaire content. The instrument was assessed for data quality, reliability, validity, and responsiveness using data from a randomised trial and patient survey about general practitioners' use of Magnetic Resonance Imaging for patients with a suspected ligamentous or meniscal injury. Results Interview and focus group data produced a 40-item questionnaire designed for self-completion. 559 trial patients and 323 survey patients responded to the questionnaire. Following principal components analysis and Rasch analysis, 26 items were found to contribute to three scales of knee-related quality of life: physical functioning, activity limitations, and emotional functioning. Item-total correlations ranged from 0.60–0.82. Cronbach's alpha and test retest reliability estimates were 0.91–0.94 and 0.80–0.93 respectively. Hypothesised correlations with the Lysholm Knee Scale, EQ-5D, SF-36 and knee symptom questions were evidence for construct validity. The instrument produced highly significant change scores for 65 trial patients indicating that their knee was a little or somewhat better at six months. The new instrument had higher effect sizes (range 0.86–1.13 and responsiveness statistics (range 1.50–2.13 than the EQ-5D and SF-36. Conclusion The KQoL-26 has good evidence for internal reliability, test-retest reliability, validity and responsiveness, and is recommended for use in randomised trials and other evaluative studies of patients with a suspected ligamentous or meniscal injury.

  7. Application of Item Response Theory to Modeling of Expanded Disability Status Scale in Multiple Sclerosis.

    NARCIS (Netherlands)

    Novakovic, A.M.; Krekels, E.H.; Munafo, A.; Ueckert, S.; Karlsson, M.O.

    2016-01-01

    In this study, we report the development of the first item response theory (IRT) model within a pharmacometrics framework to characterize the disease progression in multiple sclerosis (MS), as measured by Expanded Disability Status Score (EDSS). Data were collected quarterly from a 96-week phase III

  8. Assessing Dimensionality of Noncompensatory Multidimensional Item Response Theory with Complex Structures

    Science.gov (United States)

    Svetina, Dubravka

    2013-01-01

    The purpose of this study was to investigate the effect of complex structure on dimensionality assessment in noncompensatory multidimensional item response models using dimensionality assessment procedures based on DETECT (dimensionality evaluation to enumerate contributing traits) and NOHARM (normal ogive harmonic analysis robust method). Five…

  9. Extended Mixed-Efects Item Response Models with the MH-RM Algorithm

    Science.gov (United States)

    Chalmers, R. Philip

    2015-01-01

    A mixed-effects item response theory (IRT) model is presented as a logical extension of the generalized linear mixed-effects modeling approach to formulating explanatory IRT models. Fixed and random coefficients in the extended model are estimated using a Metropolis-Hastings Robbins-Monro (MH-RM) stochastic imputation algorithm to accommodate for…

  10. Comparison of Item Response Theory and Thurstone Methods of Vertical Scaling.

    Science.gov (United States)

    Burket, George R.; Yen, Wendy M.

    1997-01-01

    Using simulated data modeled after real tests, a Thurstone method (L. Thurstone, 1925 and later) and three-parameter item response theory were compared for vertical scaling. Neither procedure produced artificial scale shrinkage, and both produced modest scale expansion for one simulated condition. (SLD)

  11. A Comparison of Developmental Scales Based on Thurstone Methods and Item Response Theory.

    Science.gov (United States)

    Williams, Valerie S. L.; Pommerich, Mary; Thissen, David

    1998-01-01

    Created a developmental scale for the North Carolina End-of-Grade Mathematics Tests using a subset of identical test forms administered to adjacent grade levels with Thurstone scaling and Item Response Theory methods. Discusses differences in patterns produced. (Author/SLD)

  12. Mokken scale analysis : Between the Guttman scale and parametric item response theory

    NARCIS (Netherlands)

    van Schuur, Wijbrandt H.

    2003-01-01

    This article introduces a model of ordinal unidimensional measurement known as Mokken scale analysis. Mokken scaling is based on principles of Item Response Theory (IRT) that originated in the Guttman scale. I compare the Mokken model with both Classical Test Theory (reliability or factor analysis)

  13. Measuring Integration of Information and Communication Technology in Education: An Item Response Modeling Approach

    Science.gov (United States)

    Peeraer, Jef; Van Petegem, Peter

    2012-01-01

    This research describes the development and validation of an instrument to measure integration of Information and Communication Technology (ICT) in education. After literature research on definitions of integration of ICT in education, a comparison is made between the classical test theory and the item response modeling approach for the…

  14. Item response modeling: an evaluation of the children's fruit and vegetable self-efficacy questionnaire

    Science.gov (United States)

    Perceived self-efficacy (SE) for eating fruit and vegetables (FV) is a key variable mediating FV change in interventions. This study applies item response modeling (IRM) to a fruit, juice and vegetable self-efficacy questionnaire (FVSEQ) previously validated with classical test theory (CTT) procedur...

  15. Assessing Model Data Fit of Unidimensional Item Response Theory Models in Simulated Data

    Science.gov (United States)

    Kose, Ibrahim Alper

    2014-01-01

    The purpose of this paper is to give an example of how to assess the model-data fit of unidimensional IRT models in simulated data. Also, the present research aims to explain the importance of fit and the consequences of misfit by using simulated data sets. Responses of 1000 examinees to a dichotomously scoring 20 item test were simulated with 25…

  16. Can a Multidimensional Test Be Evaluated with Unidimensional Item Response Theory?

    Science.gov (United States)

    Wiberg, Marie

    2012-01-01

    The aim of this study was to evaluate possible consequences of using unidimensional item response theory (UIRT) on a multidimensional college admission test. The test consists of 5 subscales and can be divided into two sections, that is, it can be considered both as a unidimensional and a multidimensional test. The test was examined with both UIRT…

  17. The Value of Item Response Theory in Clinical Assessment: A Review

    Science.gov (United States)

    Thomas, Michael L.

    2011-01-01

    Item response theory (IRT) and related latent variable models represent modern psychometric theory, the successor to classical test theory in psychological assessment. Although IRT has become prevalent in the measurement of ability and achievement, its contributions to clinical domains have been less extensive. Applications of IRT to clinical…

  18. An Assessment of the Nonparametric Approach for Evaluating the Fit of Item Response Models

    Science.gov (United States)

    Liang, Tie; Wells, Craig S.; Hambleton, Ronald K.

    2014-01-01

    As item response theory has been more widely applied, investigating the fit of a parametric model becomes an important part of the measurement process. There is a lack of promising solutions to the detection of model misfit in IRT. Douglas and Cohen introduced a general nonparametric approach, RISE (Root Integrated Squared Error), for detecting…

  19. The Long-Term Sustainability of Different Item Response Theory Scaling Methods

    Science.gov (United States)

    Keller, Lisa A.; Keller, Robert R.

    2011-01-01

    This article investigates the accuracy of examinee classification into performance categories and the estimation of the theta parameter for several item response theory (IRT) scaling techniques when applied to six administrations of a test. Previous research has investigated only two administrations; however, many testing programs equate tests…

  20. Optimal and Most Exact Confidence Intervals for Person Parameters in Item Response Theory Models

    Science.gov (United States)

    Doebler, Anna; Doebler, Philipp; Holling, Heinz

    2013-01-01

    The common way to calculate confidence intervals for item response theory models is to assume that the standardized maximum likelihood estimator for the person parameter [theta] is normally distributed. However, this approximation is often inadequate for short and medium test lengths. As a result, the coverage probabilities fall below the given…

  1. A Multidimensional Item Response Modeling Approach for Improving Subscale Proficiency Estimation and Classification

    Science.gov (United States)

    Yao, Lihua; Boughton, Keith A.

    2007-01-01

    Several approaches to reporting subscale scores can be found in the literature. This research explores a multidimensional compensatory dichotomous and polytomous item response theory modeling approach for subscale score proficiency estimation, leading toward a more diagnostic solution. It also develops and explores the recovery of a Markov chain…

  2. Bayesian modeling of measurement error in predictor variables using item response theory

    NARCIS (Netherlands)

    Fox, Gerardus J.A.; Glas, Cornelis A.W.

    2000-01-01

    This paper focuses on handling measurement error in predictor variables using item response theory (IRT). Measurement error is of great important in assessment of theoretical constructs, such as intelligence or the school climate. Measurement error is modeled by treating the predictors as unobserved

  3. Optimal and Most Exact Confidence Intervals for Person Parameters in Item Response Theory Models

    Science.gov (United States)

    Doebler, Anna; Doebler, Philipp; Holling, Heinz

    2013-01-01

    The common way to calculate confidence intervals for item response theory models is to assume that the standardized maximum likelihood estimator for the person parameter [theta] is normally distributed. However, this approximation is often inadequate for short and medium test lengths. As a result, the coverage probabilities fall below the given…

  4. Bayesian modeling of measurement error in predictor variables using item response theory

    NARCIS (Netherlands)

    Fox, Jean-Paul; Glas, Cees A.W.

    2000-01-01

    This paper focuses on handling measurement error in predictor variables using item response theory (IRT). Measurement error is of great important in assessment of theoretical constructs, such as intelligence or the school climate. Measurement error is modeled by treating the predictors as unobserved

  5. Bayesian modeling of measurement error in predictor variables using item response theory

    NARCIS (Netherlands)

    Fox, Jean-Paul; Glas, Cees A.W.

    2003-01-01

    It is shown that measurement error in predictor variables can be modeled using item response theory (IRT). The predictor variables, that may be defined at any level of an hierarchical regression model, are treated as latent variables. The normal ogive model is used to describe the relation between t

  6. Mokken scale analysis : Between the Guttman scale and parametric item response theory

    NARCIS (Netherlands)

    van Schuur, Wijbrandt H.

    2003-01-01

    This article introduces a model of ordinal unidimensional measurement known as Mokken scale analysis. Mokken scaling is based on principles of Item Response Theory (IRT) that originated in the Guttman scale. I compare the Mokken model with both Classical Test Theory (reliability or factor analysis)

  7. Development of a Microsoft Excel tool for one-parameter Rasch model of continuous items: an application to a safety attitude survey

    Directory of Open Access Journals (Sweden)

    Tsair-Wei Chien

    2017-01-01

    Full Text Available Abstract Background Many continuous item responses (CIRs are encountered in healthcare settings, but no one uses item response theory’s (IRT probabilistic modeling to present graphical presentations for interpreting CIR results. A computer module that is programmed to deal with CIRs is required. To present a computer module, validate it, and verify its usefulness in dealing with CIR data, and then to apply the model to real healthcare data in order to show how the CIR that can be applied to healthcare settings with an example regarding a safety attitude survey. Methods Using Microsoft Excel VBA (Visual Basic for Applications, we designed a computer module that minimizes the residuals and calculates model’s expected scores according to person responses across items. Rasch models based on a Wright map and on KIDMAP were demonstrated to interpret results of the safety attitude survey. Results The author-made CIR module yielded OUTFIT mean square (MNSQ and person measures equivalent to those yielded by professional Rasch Winsteps software. The probabilistic modeling of the CIR module provides messages that are much more valuable to users and show the CIR advantage over classic test theory. Conclusions Because of advances in computer technology, healthcare users who are familiar to MS Excel can easily apply the study CIR module to deal with continuous variables to benefit comparisons of data with a logistic distribution and model fit statistics.

  8. Analysis of Multiple Partially Ordered Responses to Belief Items with Don't Know Option.

    Science.gov (United States)

    Ip, Edward H; Chen, Shyh-Huei; Quandt, Sara A

    2016-06-01

    Understanding beliefs, values, and preferences of patients is a tenet of contemporary health sciences. This application was motivated by the analysis of multiple partially ordered set (poset) responses from an inventory on layman beliefs about diabetes. The partially ordered set arises because of two features in the data-first, the response options contain a Don't Know (DK) option, and second, there were two consecutive occasions of measurement. As predicted by the common sense model of illness, beliefs about diabetes were not necessarily stable across the two measurement occasions. Instead of analyzing the two occasions separately, we studied the joint responses across the occasions as a poset response. Few analytic methods exist for data structures other than ordered or nominal categories. Poset responses are routinely collapsed and then analyzed as either rank ordered or nominal data, leading to the loss of nuanced information that might be present within poset categories. In this paper we developed a general class of item response models for analyzing the poset data collected from the Common Sense Model of Diabetes Inventory. The inferential object of interest is the latent trait that indicates congruence of belief with the biomedical model. To apply an item response model to the poset diabetes inventory, we proved that a simple coding algorithm circumvents the requirement of writing new codes such that standard IRT software could be directly used for the purpose of item estimation and individual scoring. Simulation experiments were used to examine parameter recovery for the proposed poset model.

  9. Latent Variable Selection for Multidimensional Item Response Theory Models via [Formula: see text] Regularization.

    Science.gov (United States)

    Sun, Jianan; Chen, Yunxiao; Liu, Jingchen; Ying, Zhiliang; Xin, Tao

    2016-12-01

    We develop a latent variable selection method for multidimensional item response theory models. The proposed method identifies latent traits probed by items of a multidimensional test. Its basic strategy is to impose an [Formula: see text] penalty term to the log-likelihood. The computation is carried out by the expectation-maximization algorithm combined with the coordinate descent algorithm. Simulation studies show that the resulting estimator provides an effective way in correctly identifying the latent structures. The method is applied to a real dataset involving the Eysenck Personality Questionnaire.

  10. A modular approach for item response theory modeling with the R package flirt.

    Science.gov (United States)

    Jeon, Minjeong; Rijmen, Frank

    2016-06-01

    The new R package flirt is introduced for flexible item response theory (IRT) modeling of psychological, educational, and behavior assessment data. flirt integrates a generalized linear and nonlinear mixed modeling framework with graphical model theory. The graphical model framework allows for efficient maximum likelihood estimation. The key feature of flirt is its modular approach to facilitate convenient and flexible model specifications. Researchers can construct customized IRT models by simply selecting various modeling modules, such as parametric forms, number of dimensions, item and person covariates, person groups, link functions, etc. In this paper, we describe major features of flirt and provide examples to illustrate how flirt works in practice.

  11. Empirically indistinguishable multidimensional IRT and locally dependent unidimensional item response models.

    Science.gov (United States)

    Ip, Edward Haksing

    2010-05-01

    Multidimensionality is a core concept in the measurement and analysis of psychological data. In personality assessment, for example, constructs are mostly theoretically defined as unidimensional, yet responses collected from the real world are almost always determined by multiple factors. Significant research efforts have concentrated on the use of simulated studies to evaluate the robustness of unidimensional item response models when applied to multidimensional data with a dominant dimension. In contrast, in the present paper, I report the result from a theoretical investigation that a multidimensional item response model is empirically indistinguishable from a locally dependent unidimensional model, of which the single dimension represents the actual construct of interest. A practical implication of this result is that multidimensional response data do not automatically require the use of multidimensional models. Circumstances under which the alternative approach of locally dependent unidimensional models may be useful are discussed.

  12. The Impact of Outliers on Cronbach's Coefficient Alpha Estimate of Reliability: Ordinal/Rating Scale Item Responses

    Science.gov (United States)

    Liu, Yan; Wu, Amery D.; Zumbo, Bruno D.

    2010-01-01

    In a recent Monte Carlo simulation study, Liu and Zumbo showed that outliers can severely inflate the estimates of Cronbach's coefficient alpha for continuous item response data--visual analogue response format. Little, however, is known about the effect of outliers for ordinal item response data--also commonly referred to as Likert, Likert-type,…

  13. The Impact of Outliers on Cronbach's Coefficient Alpha Estimate of Reliability: Ordinal/Rating Scale Item Responses

    Science.gov (United States)

    Liu, Yan; Wu, Amery D.; Zumbo, Bruno D.

    2010-01-01

    In a recent Monte Carlo simulation study, Liu and Zumbo showed that outliers can severely inflate the estimates of Cronbach's coefficient alpha for continuous item response data--visual analogue response format. Little, however, is known about the effect of outliers for ordinal item response data--also commonly referred to as Likert, Likert-type,…

  14. Surveillance indicators for potential reduced exposure products (PREPs: developing survey items to measure awareness

    Directory of Open Access Journals (Sweden)

    McNeill Ann

    2009-10-01

    Full Text Available Abstract Background Over the past decade, tobacco companies have introduced cigarettes and smokeless tobacco products (known as Potential Reduced Exposure Products, PREPs with purportedly lower levels of some toxins than conventional cigarettes and smokeless products. It is essential that public health agencies monitor awareness, interest, use, and perceptions of these products so that their impact on population health can be detected at the earliest stages. Methods This paper reviews and critiques existing strategies for measuring awareness of PREPs from 16 published and unpublished studies. From these measures, we developed new surveillance items and subjected them to two rounds of cognitive testing, a common and accepted method for evaluating questionnaire wording. Results Our review suggests that high levels of awareness of PREPs reported in some studies are likely to be inaccurate. Two likely sources of inaccuracy in awareness measures were identified: 1 the tendency of respondents to misclassify "no additive" and "natural" cigarettes as PREPs and 2 the tendency of respondents to mistakenly report awareness as a result of confusion between PREPs brands and similarly named familiar products, for example, Eclipse chewing gum and Accord automobiles. Conclusion After evaluating new measures with cognitive interviews, we conclude that as of winter 2006, awareness of reduced exposure products among U.S. smokers was likely to be between 1% and 8%, with the higher estimates for some products occurring in test markets. Recommended measurement strategies for future surveys are presented.

  15. Developing energy and momentum conceptual survey (EMCS) with four-tier diagnostic test items

    Science.gov (United States)

    Afif, Nur Faadhilah; Nugraha, Muhammad Gina; Samsudin, Achmad

    2017-05-01

    Students' conceptions of work and energy are important to support the learning process in the classroom. For that reason, a diagnostic test instrument is needed to diagnose students' conception of work and energy. As a result, the researcher decided to develop Energy and Momentum Conceptual Survey (EMCS) instrument test into four-tier test diagnostic items. The purpose of this research is organized as the first step of four-tier test-formatted EMCS development as one of diagnostic test instruments on work and Energy. The research method used the 4D model (Defining, Designing, Developing and Disseminating). The instrument developed has been tested to 39 students in one of Senior High Schools. The resulting research showed that four-tier test-formatted EMCS is able to diagnose students' conception level of work and energy concept. It can be concluded that the development of four-tier test-formatted EMCS is one of potential diagnostic test instruments that able to obtain the category of students who understand concepts, misconceptions and do not understand about Work and Energy concept at all.

  16. Application of Item Response Theory to Modeling of Expanded Disability Status Scale in Multiple Sclerosis.

    Science.gov (United States)

    Novakovic, A M; Krekels, E H J; Munafo, A; Ueckert, S; Karlsson, M O

    2017-01-01

    In this study, we report the development of the first item response theory (IRT) model within a pharmacometrics framework to characterize the disease progression in multiple sclerosis (MS), as measured by Expanded Disability Status Score (EDSS). Data were collected quarterly from a 96-week phase III clinical study by a blinder rater, involving 104,206 item-level observations from 1319 patients with relapsing-remitting MS (RRMS), treated with placebo or cladribine. Observed scores for each EDSS item were modeled describing the probability of a given score as a function of patients' (unobserved) disability using a logistic model. Longitudinal data from placebo arms were used to describe the disease progression over time, and the model was then extended to cladribine arms to characterize the drug effect. Sensitivity with respect to patient disability was calculated as Fisher information for each EDSS item, which were ranked according to the amount of information they contained. The IRT model was able to describe baseline and longitudinal EDSS data on item and total level. The final model suggested that cladribine treatment significantly slows disease-progression rate, with a 20% decrease in disease-progression rate compared to placebo, irrespective of exposure, and effects an additional exposure-dependent reduction in disability progression. Four out of eight items contained 80% of information for the given range of disabilities. This study has illustrated that IRT modeling is specifically suitable for accurate quantification of disease status and description and prediction of disease progression in phase 3 studies on RRMS, by integrating EDSS item-level data in a meaningful manner.

  17. Evaluation of a skin self examination attitude scale using an item response theory model approach.

    Science.gov (United States)

    Djaja, Ngadiman; Youl, Pip; Aitken, Joanne; Janda, Monika

    2014-12-24

    The Skin Self-Examination Attitude Scale (SSEAS) is a brief measure that allows for the assessment of attitudes in relation to skin self-examination. This study evaluated the psychometric properties of the SSEAS using Item Response Theory (IRT) methods in a large sample of men ≥ 50 years in Queensland, Australia. A sample of 831 men (420 intervention and 411 control) completed a telephone assessment at the 13-month follow-up of a randomized-controlled trial of a video-based intervention to improve skin self-examination (SSE) behaviour. Descriptive statistics (mean, standard deviation, item-total correlations, and Cronbach's alpha) were compiled and difficulty parameters were computed with Winsteps using the polytomous Rasch Rating Scale Model (RRSM). An item person (Wright) map of the SSEAS was examined for content coverage and item targeting. The SSEAS have good psychometric properties including good internal consistency (Cronbach's alpha = 0.80), fit with the model and no evidence for differential item functioning (DIF) due to experimental trial grouping was detected. The present study confirms the SSEA scale as a brief, useful and reliable tool for assessing attitudes towards skin self-examination in a population of men 50 years or older in Queensland, Australia. The 8-item scale shows unidimensionality, allowing levels of SSE attitude, and the item difficulties, to be ranked on a single continuous scale. In terms of clinical practice, it is very important to assess skin cancer self-examination attitude to identify people who may need a more extensive intervention to allow early detection of skin cancer.

  18. Development of a short version of the visual function questionnaire using item-response theory.

    Directory of Open Access Journals (Sweden)

    Shunichi Fukuhara

    Full Text Available PURPOSE: In clinical ophthalmology as in other fields, measuring patient-reported outcomes imposes a burden on patients. To decrease that burden, we used item-response theory (IRT to develop and test a short version of the National Eye Institute's Visual Function Questionnaire (VFQ. METHODS: We analyzed VFQ data from 276 adults in Japan. Most of them had glaucoma, cataract, or macular degeneration. Their visual acuity (Snellen fraction averaged 20/120 (range: 20/13 to 20/2000 for the better eye, and 20/200 (range: 20/13 to 20/2000 for the worse eye. We used a polytomous IRT model, the Generalized Partial Credit Model as implemented in software for parameter scaling of rating data (PARSCALE. To select items for inclusion in the short version we examined each item's location on the latent-trait continuum, its slope, and its frequency of missing data. We also ensured representation of all 7 domains that are important in Japan. To examine the characteristics of the resulting scale, we computed its test information (an index of precision that can vary with the value of the latent trait, and carried out validation testing. RESULTS: From 32 of the original VFQ items, we selected 11. The scale comprising those 11 items (the VFQ-J11 had test information greater than 9 for values of the latent trait between -2.0 and +0.8. The item thresholds were well-targeted for patients with vision problems. Scores on the VFQ-J11 correlated strongly and in the expected direction with measures of visual field and corrected visual acuity. As expected for a valid measure, those scores also improved by a large amount (almost one standard deviation after cataract surgery. CONCLUSION: This 11-item instrument can provide reliable and the valid data on visual functioning in patients with ophthalmic problems. It is expected to be less of a burden on respondents, while it maintains good psychometric properties.

  19. Applicability of Item Response Theory to the Korean Nurses' Licensing Examination

    Directory of Open Access Journals (Sweden)

    Geum-Hee Jeong

    2005-06-01

    Full Text Available To test the applicability of item response theory (IRT to the Korean Nurses' Licensing Examination (KNLE, item analysis was performed after testing the unidimensionality and goodness-of-fit. The results were compared with those based on classical test theory. The results of the 330-item KNLE administered to 12,024 examinees in January 2004 were analyzed. Unidimensionality was tested using DETECT and the goodness-of-fit was tested using WINSTEPS for the Rasch model and Bilog-MG for the two-parameter logistic model. Item analysis and ability estimation were done using WINSTEPS. Using DETECT, Dmax ranged from 0.1 to 0.23 for each subject. The mean square value of the infit and outfit values of all items using WINSTEPS ranged from 0.1 to 1.5, except for one item in pediatric nursing, which scored 1.53. Of the 330 items, 218 (42.7% were misfit using the two-parameter logistic model of Bilog-MG. The correlation coefficients between the difficulty parameter using the Rasch model and the difficulty index from classical test theory ranged from 0.9039 to 0.9699. The correlation between the ability parameter using the Rasch model and the total score from classical test theory ranged from 0.9776 to 0.9984. Therefore, the results of the KNLE fit unidimensionality and goodness-of-fit for the Rasch model. The KNLE should be a good sample for analysis according to the IRT Rasch model, so further research using IRT is possible.

  20. A Non-Parametric Item Response Theory Evaluation of the CAGE Instrument Among Older Adults.

    Science.gov (United States)

    Abdin, Edimansyah; Sagayadevan, Vathsala; Vaingankar, Janhavi Ajit; Picco, Louisa; Chong, Siow Ann; Subramaniam, Mythily

    2017-08-04

    The validity of the CAGE using item response theory (IRT) has not yet been examined in older adult population. This study aims to investigate the psychometric properties of the CAGE using both non-parametric and parametric IRT models, assess whether there is any differential item functioning (DIF) by age, gender and ethnicity and examine the measurement precision at the cut-off scores. We used data from the Well-being of the Singapore Elderly study to conduct Mokken scaling analysis (MSA), dichotomous Rasch and 2-parameter logistic IRT models. The measurement precision at the cut-off scores were evaluated using classification accuracy (CA) and classification consistency (CC). The MSA showed the overall scalability H index was 0.459, indicating a medium performing instrument. All items were found to be homogenous, measuring the same construct and able to discriminate well between respondents with high levels of the construct and the ones with lower levels. The item discrimination ranged from 1.07 to 6.73 while the item difficulty ranged from 0.33 to 2.80. Significant DIF was found for 2-item across ethnic group. More than 90% (CC and CA ranged from 92.5% to 94.3%) of the respondents were consistently and accurately classified by the CAGE cut-off scores of 2 and 3. The current study provides new evidence on the validity of the CAGE from the IRT perspective. This study provides valuable information of each item in the assessment of the overall severity of alcohol problem and the precision of the cut-off scores in older adult population.

  1. Linking Physical and Mental Health Summary Scores from the Veterans RAND 12-Item Health Survey (VR-12) to the PROMIS(®) Global Health Scale.

    Science.gov (United States)

    Schalet, Benjamin D; Rothrock, Nan E; Hays, Ron D; Kazis, Lewis E; Cook, Karon F; Rutsohn, Joshua P; Cella, David

    2015-10-01

    Global health measures represent an attractive option for researchers and clinicians seeking a brief snapshot of a patient's overall perspective on his or her health. Because scores on different global health measures are not comparable, comparative effectiveness research (CER) is challenging. To establish a common reporting metric so that the physical and mental health scores on the Veterans RAND 12-Item Health Survey (VR-12 (©) ) can be converted into scores on the corresponding Patient Reported Outcomes Measurement Information System (PROMIS(®)) Global Health scores. Following a single-sample linking design, participants from an Internet panel completed items from the PROMIS Global Health and VR-12 Health Survey. A common metric was created using analyses based on item response theory (IRT), producing score cross-walk tables for the mental and physical health components of each measure. The linking relationships were evaluated by calculating the standard deviation of differences between the observed and linked PROMIS scores and estimating confidence intervals by sample size. Participants (N = 2025) were 49 % male and 73 % white; mean age was 46 years. Mental and physical health subscales of the PROMIS Global Health and the VR-12. The mean VR-12 physical component and mental component scores were 45.2 and 46.6, respectively; the mean PROMIS physical and mental health scores were 48.3 and 48.5, respectively. We found evidence that the combined set of VR-12 and PROMIS items were relatively unidimensional and that we could proceed with linking. Linking worked better between the physical health than mental health scores using VR-12 item responses (vs. linking based on algorithmic scores). For each of the cross-walks, users can minimize the impact of linking error with modest increases in sample sizes. VR-12 scores can be expressed on the PROMIS Global Health metric to facilitate the evaluation of treatment, including CER. Extending these results to other common

  2. Item response theory analysis of the Lichtenberg Financial Decision Screening Scale.

    Science.gov (United States)

    Teresi, Jeanne A; Ocepek-Welikson, Katja; Lichtenberg, Peter A

    2017-06-07

    The focus of these analyses was to examine the psychometric properties of the Lichtenberg Financial Decision Screening Scale (LFDSS). The purpose of the screen was to evaluate the decisional abilities and vulnerability to exploitation of older adults. Adults aged 60 and over were interviewed by social, legal, financial, or health services professionals who underwent in-person training on the administration and scoring of the scale. Professionals provided a rating of the decision-making abilities of the older adult. The analytic sample included 213 individuals with an average age of 76.9 (SD = 10.1). The majority (57%) were female. Data were analyzed using item response theory (IRT) methodology. The results supported the unidimensionality of the item set. Several IRT models were tested. Ten ordinal and binary items evidenced a slightly higher reliability estimate (0.85) than other versions and better coverage in terms of the range of reliable measurement across the continuum of financial incapacity.

  3. Measuring the quality of life in hypertension according to Item Response Theory.

    Science.gov (United States)

    Borges, José Wicto Pereira; Moreira, Thereza Maria Magalhães; Schmitt, Jeovani; Andrade, Dalton Francisco de; Barbetta, Pedro Alberto; Souza, Ana Célia Caetano de; Lima, Daniele Braz da Silva; Carvalho, Irialda Saboia

    2017-05-04

    To analyze the Miniquestionário de Qualidade de Vida em Hipertensão Arterial (MINICHAL - Mini-questionnaire of Quality of Life in Hypertension) using the Item Response Theory. This is an analytical study conducted with 712 persons with hypertension treated in thirteen primary health care units of Fortaleza, State of Ceará, Brazil, in 2015. The steps of the analysis by the Item Response Theory were: evaluation of dimensionality, estimation of parameters of items, and construction of scale. The study of dimensionality was carried out on the polychoric correlation matrix and confirmatory factor analysis. To estimate the item parameters, we used the Gradual Response Model of Samejima. The analyses were conducted using the free software R with the aid of psych and mirt. The analysis has allowed the visualization of item parameters and their individual contributions in the measurement of the latent trait, generating more information and allowing the construction of a scale with an interpretative model that demonstrates the evolution of the worsening of the quality of life in five levels. Regarding the item parameters, the items related to the somatic state have had a good performance, as they have presented better power to discriminate individuals with worse quality of life. The items related to mental state have been those which contributed with less psychometric data in the MINICHAL. We conclude that the instrument is suitable for the identification of the worsening of the quality of life in hypertension. The analysis of the MINICHAL using the Item Response Theory has allowed us to identify new sides of this instrument that have not yet been addressed in previous studies. Analisar o Miniquestionário de Qualidade de Vida em Hipertensão Arterial (MINICHAL) por meio da Teoria da Resposta ao Item. Estudo analítico realizado com 712 pessoas com hipertensão arterial atendidas em 13 unidades de atenção primária em saúde de Fortaleza, CE, em 2015. As etapas da an

  4. Modeling the World Health Organization Disability Assessment Schedule II using non-parametric item response models.

    Science.gov (United States)

    Galindo-Garre, Francisca; Hidalgo, María Dolores; Guilera, Georgina; Pino, Oscar; Rojo, J Emilio; Gómez-Benito, Juana

    2015-03-01

    The World Health Organization Disability Assessment Schedule II (WHO-DAS II) is a multidimensional instrument developed for measuring disability. It comprises six domains (getting around, self-care, getting along with others, life activities and participation in society). The main purpose of this paper is the evaluation of the psychometric properties for each domain of the WHO-DAS II with parametric and non-parametric Item Response Theory (IRT) models. A secondary objective is to assess whether the WHO-DAS II items within each domain form a hierarchy of invariantly ordered severity indicators of disability. A sample of 352 patients with a schizophrenia spectrum disorder is used in this study. The 36 items WHO-DAS II was administered during the consultation. Partial Credit and Mokken scale models are used to study the psychometric properties of the questionnaire. The psychometric properties of the WHO-DAS II scale are satisfactory for all the domains. However, we identify a few items that do not discriminate satisfactorily between different levels of disability and cannot be invariantly ordered in the scale. In conclusion the WHO-DAS II can be used to assess overall disability in patients with schizophrenia, but some domains are too general to assess functionality in these patients because they contain items that are not applicable to this pathology.

  5. Development and Reliability of Items Measuring the Nonmedical Use of Prescription Drugs for the Youth Risk Behavior Survey: Results Froman Initial Pilot Test

    Science.gov (United States)

    Howard, Melissa M.; Weiler, Robert M.; Haddox, J. David

    2009-01-01

    Background: The purpose of this study was to develop and test the reliability of self-report survey items designed to monitor the nonmedical use of prescription drugs among adolescents. Methods: Eighteen nonmedical prescription drug items designed to be congruent with the substance abuse items in the US Centers for Disease Control and Prevention's…

  6. Improving Survey Response Rates in Online Panels

    DEFF Research Database (Denmark)

    Pedersen, Mogens Jin; Nielsen, Christian Videbæk

    2016-01-01

    experiment among 6,162 members of an online survey panel, this article shows how low-cost incentives and cost-free text appeal interventions may impact the survey response rate in online panels. The experimental treatments comprise (a) a cash prize lottery incentive, (b) two donation incentives equating......Identifying ways to efficiently maximize the response rate to surveys is important to survey-based research. However, evidence on the response rate effect of donation incentives and especially altruistic and egotistic-type text appeal interventions is sparse and ambiguous. By a randomized survey...... survey response with a monetary donation to a good cause, (c) an egotistic-type text appeal, and (d) an altruistic-type text appeal. Relative to a control group, we find higher response rates among the recipients of the egotistic-type text appeal and the lottery incentive. Donation incentives yield lower...

  7. The Psychological Effect of Errors in Standardized Language Test Items on EFL Students' Responses to the Following Item

    Science.gov (United States)

    Khaksefidi, Saman

    2017-01-01

    This study investigates the psychological effect of a wrong question with wrong items on answering to the next question in a test of structure. Forty students selected through stratified random sampling are given 15 questions of a standardized test namely a TOEFL structure test in which questions number 7 and number 11 are wrong and their answers…

  8. Item response theory applied to factors affecting the patient journey towards hearing rehabilitation

    Directory of Open Access Journals (Sweden)

    Michelene Chenault

    2016-11-01

    Full Text Available To develop a tool for use in hearing screening and to evaluate the patient journey towards hearing rehabilitation, responses to the hearing aid rehabilitation questionnaire scales aid stigma, pressure, and aid unwanted addressing respectively hearing aid stigma, experienced pressure from others; perceived hearing aid benefit were evaluated with item response theory. The sample was comprised of 212 persons aged 55 years or more; 63 were hearing aid users, 64 with and 85 persons without hearing impairment according to guidelines for hearing aid reimbursement in the Netherlands. Bias was investigated relative to hearing aid use and hearing impairment within the differential test functioning framework. Items compromising model fit or demonstrating differential item functioning were dropped. The aid stigma scale was reduced from 6 to 4, the pressure scale from 7 to 4, and the aid unwanted scale from 5 to 4 items. This procedure resulted in bias-free scales ready for screening purposes and application to further understand the help-seeking process of the hearing impaired.

  9. Development of an abbreviated Career Indecision Profile-65 using item response theory: The CIP-Short.

    Science.gov (United States)

    Xu, Hui; Tracey, Terence J G

    2017-03-01

    The current study developed an abbreviated version of the Career Indecision Profile-65 (CIP-65; Hacker, Carr, Abrams, & Brown, 2013) by using item response theory. In order to improve the efficiency of the CIP-65 in measuring career indecision, the individual item performance of the CIP-65 was examined with respect to the ordering of response occurrence and gender differential item functioning. The best 5 items of each scale of the CIP-65 (i.e., neuroticism/negative affectivity, choice/commitment anxiety, lack of readiness, and interpersonal conflicts) were retained in the CIP-Short using a sample of 588 college students. A validation sample (N = 174) supported the reliability and structural validity of the CIP-Short. The convergent and divergent validity of the CIP-Short was additionally supported in the findings of a hypothesized differential relational pattern in a separate sample (N = 360). While the current study supported the CIP-Short being a sound brief measure of career indecision, the limitations of this study and suggestions for future research were discussed as well. (PsycINFO Database Record

  10. Further Investigating Method Effects Associated with Negatively Worded Items on Self-Report Surveys

    Science.gov (United States)

    DiStefano, Christine; Motl, Robert W.

    2006-01-01

    This article used multitrait-multimethod methodology and covariance modeling for an investigation of the presence and correlates of method effects associated with negatively worded items on the Rosenberg Self-Esteem (RSE) scale (Rosenberg, 1989) using a sample of 757 adults. Results showed that method effects associated with negative item phrasing…

  11. Further Investigating Method Effects Associated with Negatively Worded Items on Self-Report Surveys

    Science.gov (United States)

    DiStefano, Christine; Motl, Robert W.

    2006-01-01

    This article used multitrait-multimethod methodology and covariance modeling for an investigation of the presence and correlates of method effects associated with negatively worded items on the Rosenberg Self-Esteem (RSE) scale (Rosenberg, 1989) using a sample of 757 adults. Results showed that method effects associated with negative item phrasing…

  12. Modeling a Composite Score in Parkinson's Disease Using Item Response Theory.

    Science.gov (United States)

    Gottipati, Gopichand; Karlsson, Mats O; Plan, Elodie L

    2017-02-28

    In the current work, we present the methodology for development of an Item Response Theory model within a non-linear mixed effects framework to characterize the longitudinal changes of the Movement Disorder Society (sponsored revision) of Unified Parkinson's Disease Rating Scale (MDS-UPDRS) endpoint in Parkinson's disease (PD). The data were obtained from Parkinson's Progression Markers Initiative database and included 163,070 observations up to 48 months from 430 subjects belonging to De Novo PD cohort. The probability of obtaining a score, reported for each of the items in the questionnaire, was modeled as a function of the subject's disability. Initially, a single latent variable model was explored to characterize the disease progression over time. However, based on the understanding of the questionnaire set-up and the results of a residuals-based diagnostic tool, a three latent variable model with a mixture implementation was able to adequately describe longitudinal changes not only at the total score level but also at each individual item level. The linear progression rates obtained for the patient-reported items and the non-sided items were similar, each of which roughly take about 50 months for a typical subject to progress linearly from the baseline by one standard deviation. However for the sided items, it was found that the better side deteriorates quicker than the disabled side. This study presents a framework for analyzing MDS-UPDRS data, which can be adapted to more traditional UPDRS data collected in PD clinical trials and result in more efficient designs and analyses of such studies.

  13. Sample Size Requirements for Estimation of Item Parameters in the Multidimensional Graded Response Model

    Directory of Open Access Journals (Sweden)

    Shengyu eJiang

    2016-02-01

    Full Text Available Likert types of rating scales in which a respondent chooses a response from an ordered set of response options are used to measure a wide variety of psychological, educational, and medical outcome variables. The most appropriate item response theory model for analyzing and scoring these instruments when they provide scores on multiple scales is the multidimensional graded response model (MGRM. A simulation study was conducted to investigate the variables that might affect item parameter recovery for the MGRM. Data were generated based on different sample sizes, test lengths, and scale intercorrelations. Parameter estimates were obtained through the flexiMIRT software. The quality of parameter recovery was assessed by the correlation between true and estimated parameters as well as bias and root- mean-square-error. Results indicated that for the vast majority of cases studied a sample size of N = 500 provided accurate parameter estimates, except for tests with 240 items when 1,000 examinees were necessary to obtain accurate parameter estimates. Increasing sample size beyond N = 1,000 did not increase the accuracy of MGRM parameter estimates.

  14. Analyzing Multiple-Choice Questions by Model Analysis and Item Response Curves

    Science.gov (United States)

    Wattanakasiwich, P.; Ananta, S.

    2010-07-01

    In physics education research, the main goal is to improve physics teaching so that most students understand physics conceptually and be able to apply concepts in solving problems. Therefore many multiple-choice instruments were developed to probe students' conceptual understanding in various topics. Two techniques including model analysis and item response curves were used to analyze students' responses from Force and Motion Conceptual Evaluation (FMCE). For this study FMCE data from more than 1000 students at Chiang Mai University were collected over the past three years. With model analysis, we can obtain students' alternative knowledge and the probabilities for students to use such knowledge in a range of equivalent contexts. The model analysis consists of two algorithms—concentration factor and model estimation. This paper only presents results from using the model estimation algorithm to obtain a model plot. The plot helps to identify a class model state whether it is in the misconception region or not. Item response curve (IRC) derived from item response theory is a plot between percentages of students selecting a particular choice versus their total score. Pros and cons of both techniques are compared and discussed.

  15. KernSmoothIRT: An R Package for Kernel Smoothing in Item Response Theory

    Directory of Open Access Journals (Sweden)

    Angelo Mazza

    2014-06-01

    Full Text Available Item response theory (IRT models are a class of statistical models used to describe the response behaviors of individuals to a set of items having a certain number of options. They are adopted by researchers in social science, particularly in the analysis of performance or attitudinal data, in psychology, education, medicine, marketing and other fields where the aim is to measure latent constructs. Most IRT analyses use parametric models that rely on assumptions that often are not satisfied. In such cases, a nonparametric approach might be preferable; nevertheless, there are not many software implementations allowing to use that. To address this gap, this paper presents the R package KernSmoothIRT . It implements kernel smoothing for the estimation of option characteristic curves, and adds several plotting and analytical tools to evaluate the whole test/questionnaire, the items, and the subjects. In order to show the package's capabilities, two real datasets are used, one employing multiple-choice responses, and the other scaled responses.

  16. Item response theory analysis of the modified Roland-Morris Disability Questionnaire in a population-based study.

    Science.gov (United States)

    Mielenz, Thelma J; Carey, Timothy S; Edwards, Michael C

    2015-03-15

    This is a secondary analysis of a cross-sectional population-based survey. Shorten the modified 23-item Roland (mRoland) scale using item response theory (IRT) methods and describe where in the functional disability range each scale is the most precise. The Roland-Morris Disability Questionnaire is recommended for a functional disability outcome measure in patients with low back pain (LBP). One commonly used version is the Roland. It is unknown where in the functional disability range the Roland measures. One candidate individual with LBP in randomly selected households was interviewed, identifying 694 adults with chronic LBP. To justify the use of a unidimensional 2-parameter logistic IRT model, we performed both exploratory and confirmatory factor analysis. Exploratory factor analysis revealed one dominant eigenvalue. Confirmatory factor analysis results indicate that the 1-factor model fit well. IRT analysis revealed variability in the slopes, in the range from 1.07 to 3.10. The marginal reliability, an IRT-based analog to coefficient α, was 0.88. The mRoland produces reliable scores (i.e., with a standard error disability. The mRoland measures high levels of functional disability with relatively poor reliability and may be more appropriate for a less-disabled population with LBP. We demonstrate that the mRoland can be shortened to 11 items with minimal loss of information. We show that there are different ways to go about selecting the set of 11 items that yield short forms with different strengths. 3.

  17. Application of Group-Level Item Response Models in the Evaluation of Consumer Reports about Health Plan Quality

    Science.gov (United States)

    Reise, Steven P.; Meijer, Rob R.; Ainsworth, Andrew T.; Morales, Leo S.; Hays, Ron D.

    2006-01-01

    Group-level parametric and non-parametric item response theory models were applied to the Consumer Assessment of Healthcare Providers and Systems (CAHPS[R]) 2.0 core items in a sample of 35,572 Medicaid recipients nested within 131 health plans. Results indicated that CAHPS responses are dominated by within health plan variation, and only weakly…

  18. On the Relationship between Classical Test Theory and Item Response Theory: From One to the Other and Back

    Science.gov (United States)

    Raykov, Tenko; Marcoulides, George A.

    2016-01-01

    The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete…

  19. An Evaluation of the Brief Symptom Inventory-18 Using Item Response Theory : Which Items Are Most Strongly Related to Psychological Distress?

    NARCIS (Netherlands)

    Meijer, Rob R.; de Vries, Rivka M.; van Bruggen, Vincent

    2011-01-01

    The psychometric structure of the Brief Symptom Inventory-18 (BSI-18; Derogatis, 2001) was investigated using Mokken scaling and parametric item response theory. Data of 487 outpatients, 266 students, and 207 prisoners were analyzed. Results of the Mokken analysis indicated that the BSI-18 formed a

  20. An evaluation of the Brief Symptom Inventory-18 using item response theory: which items are most strongly related to psychological distress?

    NARCIS (Netherlands)

    Meijer, Rob R.; Vries, de Rivka M.; Bruggen, van Vincent

    2011-01-01

    The psychometric structure of the Brief Symptom Inventory–18 (BSI-18; Derogatis, 2001) was investigated using Mokken scaling and parametric item response theory. Data of 487 outpatients, 266 students, and 207 prisoners were analyzed. Results of the Mokken analysis indicated that the BSI-18 formed a

  1. An Evaluation of the Brief Symptom Inventory-18 Using Item Response Theory: Which Items Are Most Strongly Related to Psychological Distress?

    Science.gov (United States)

    Meijer, Rob R.; de Vries, Rivka M.; van Bruggen, Vincent

    2011-01-01

    The psychometric structure of the Brief Symptom Inventory-18 (BSI-18; Derogatis, 2001) was investigated using Mokken scaling and parametric item response theory. Data of 487 outpatients, 266 students, and 207 prisoners were analyzed. Results of the Mokken analysis indicated that the BSI-18 formed a strong Mokken scale for outpatients and…

  2. Re-evaluating a vision-related quality of life questionnaire with item response theory (IRT) and differential item functioning (DIF) analyses.

    NARCIS (Netherlands)

    Nispen, R.M.A. van; Knol, D.L.; Langelaan, M.; Rens, G.H.M.B. van

    2011-01-01

    Background: For the Low Vision Quality Of Life questionnaire (LVQOL) it is unknown whether the psychometric properties are satisfactory when an item response theory (IRT) perspective is considered. This study evaluates some essential psychometric properties of the LVQOL questionnaire in an IRT model

  3. Item Response Theory Analysis and Differential Item Functioning across Age, Gender and Country of a Short Form of the Advanced Progressive Matrices

    Science.gov (United States)

    Chiesi, Francesca; Ciancaleoni, Matteo; Galli, Silvia; Morsanyi, Kinga; Primi, Caterina

    2012-01-01

    Item Response Theory (IRT) models were applied to investigate the psychometric properties of the Arthur and Day's Advanced Progressive Matrices-Short Form (APM-SF; 1994) [Arthur and Day (1994). "Development of a short form for the Raven Advanced Progressive Matrices test." "Educational and Psychological Measurement, 54," 395-403] in order to test…

  4. Is a single-item visual analogue scale as valid, reliable and responsive as multi-item scales in measuring quality of life?

    NARCIS (Netherlands)

    Boer, A.G.E.M. de; Lanschot, J.J.B. van; Stalmeier, P.F.M.; Sandick, J.W. van; Hulscher, J.B.F.; Haes, J.C.J.M. de; Sprangers, M.A.G.

    2004-01-01

    PURPOSE: To compare the validity, reliability and responsiveness of a single, global quality of life question to multi-item scales. METHOD: Data were obtained from 83 consecutive patients with oesophageal adenocarcinoma undergoing either transhiatal or transthoracic oesophagectomy. Quality of life w

  5. Detection and validation of unscalable item score patterns using Item Response Theory: An illustration with Harter's Self-Perception Profile for Children

    NARCIS (Netherlands)

    Meijer, R.R.; Egberink, I.J.L.; Emons, Wilco H.M.; Sijtsma, Klaas

    2008-01-01

    We illustrate the usefulness of person-fit methodology for personality assessment. For this purpose, we use person-fit methods from item response theory. First, we give a nontechnical introduction to existing person-fit statistics. Second, we analyze data from Harter's (1985)Self-Perception Profile

  6. An evaluation of the brief symptom inventory-18 using item response theory: which items are most strongly related to psychological distress?

    NARCIS (Netherlands)

    Meijer, R.R.; de Vries, Rivka M.; van Bruggen, Vincent

    2011-01-01

    The psychometric structure of the Brief Symptom Inventory–18 (BSI-18; Derogatis, 2001) was investigated using Mokken scaling and parametric item response theory. Data of 487 outpatients, 266 students, and 207 prisoners were analyzed. Results of the Mokken analysis indicated that the BSI-18 formed a

  7. An Evaluation of the Brief Symptom Inventory-18 Using Item Response Theory : Which Items Are Most Strongly Related to Psychological Distress?

    NARCIS (Netherlands)

    Meijer, Rob R.; de Vries, Rivka M.; van Bruggen, Vincent

    The psychometric structure of the Brief Symptom Inventory-18 (BSI-18; Derogatis, 2001) was investigated using Mokken scaling and parametric item response theory. Data of 487 outpatients, 266 students, and 207 prisoners were analyzed. Results of the Mokken analysis indicated that the BSI-18 formed a

  8. An Evaluation of the Brief Symptom Inventory-18 Using Item Response Theory: Which Items Are Most Strongly Related to Psychological Distress?

    Science.gov (United States)

    Meijer, Rob R.; de Vries, Rivka M.; van Bruggen, Vincent

    2011-01-01

    The psychometric structure of the Brief Symptom Inventory-18 (BSI-18; Derogatis, 2001) was investigated using Mokken scaling and parametric item response theory. Data of 487 outpatients, 266 students, and 207 prisoners were analyzed. Results of the Mokken analysis indicated that the BSI-18 formed a strong Mokken scale for outpatients and…

  9. Capturing Abnormal Personality With Normal Personality Inventories: An Item Response Theory Approach

    OpenAIRE

    2008-01-01

    Correlational and factor-analytic methods indicate that abnormal and normal personality constructs may be tapping the same underlying latent trait. However, they do not systematically demonstrate that measures of abnormal personality capture more extreme ranges of the latent trait than measures of normal range personality. Item Response Theory (IRT) methods, in contrast, do provide this information. In the present study, we use IRT methods to evaluate the range of the latent trait assessed wi...

  10. The diagnostic utility of separation anxiety disorder symptoms: an item response theory analysis.

    Science.gov (United States)

    Cooper-Vince, Christine E; Emmert-Aronson, Benjamin O; Pincus, Donna B; Comer, Jonathan S

    2014-01-01

    At present, it is not clear whether the current definition of separation anxiety disorder (SAD) is the optimal classification of developmentally inappropriate, severe, and interfering separation anxiety in youth. Much remains to be learned about the relative contributions of individual SAD symptoms for informing diagnosis. Two-parameter logistic Item Response Theory analyses were conducted on the eight core SAD symptoms in an outpatient anxiety sample of treatment-seeking children (N = 359, 59.3 % female, M Age = 11.2) and their parents to determine the diagnostic utility of each of these symptoms. Analyses considered values of item threshold, which characterize the SAD severity level at which each symptom has a 50 % chance of being endorsed, and item discrimination, which characterize how well each symptom distinguishes individuals with higher and lower levels of SAD. Distress related to separation and fear of being alone without major attachment figures showed the strongest discrimination properties and the lowest thresholds for being endorsed. In contrast, worry about harm befalling attachment figures showed the poorest discrimination properties, and nightmares about separation showed the highest threshold for being endorsed. Distress related to separation demonstrated crossing differential item functioning associated with age-at lower separation anxiety levels excessive fear at separation was more likely to be endorsed for children ≥9 years, whereas at higher levels this symptom was more likely to be endorsed by children <9 years. Implications are discussed for optimizing the taxonomy of SAD in youth.

  11. Reading ability and print exposure: item response theory analysis of the author recognition test.

    Science.gov (United States)

    Moore, Mariah; Gordon, Peter C

    2015-12-01

    In the author recognition test (ART), participants are presented with a series of names and foils and are asked to indicate which ones they recognize as authors. The test is a strong predictor of reading skill, and this predictive ability is generally explained as occurring because author knowledge is likely acquired through reading or other forms of print exposure. In this large-scale study (1,012 college student participants), we used item response theory (IRT) to analyze item (author) characteristics in order to facilitate identification of the determinants of item difficulty, provide a basis for further test development, and optimize scoring of the ART. Factor analysis suggested a potential two-factor structure of the ART, differentiating between literary and popular authors. Effective and ineffective author names were identified so as to facilitate future revisions of the ART. Analyses showed that the ART is a highly significant predictor of the time spent encoding words, as measured using eyetracking during reading. The relationship between the ART and time spent reading provided a basis for implementing a higher penalty for selecting foils, rather than the standard method of ART scoring (names selected minus foils selected). The findings provide novel support for the view that the ART is a valid indicator of reading volume. Furthermore, they show that frequency data can be used to select items of appropriate difficulty, and that frequency data from corpora based on particular time periods and types of texts may allow adaptations of the test for different populations.

  12. ltm: An R Package for Latent Variable Modeling and Item Response Analysis

    Directory of Open Access Journals (Sweden)

    Dimitris Rizopoulos

    2006-11-01

    Full Text Available The R package ltm has been developed for the analysis of multivariate dichotomous and polytomous data using latent variable models, under the Item Response Theory approach. For dichotomous data the Rasch, the Two-Parameter Logistic, and Birnbaum's Three-Parameter models have been implemented, whereas for polytomous data Semejima's Graded Response model is available. Parameter estimates are obtained under marginal maximum likelihood using the Gauss-Hermite quadrature rule. The capabilities and features of the package are illustrated using two real data examples.

  13. Limited information estimation of the diffusion-based item response theory model for responses and response times.

    Science.gov (United States)

    Ranger, Jochen; Kuhn, Jörg-Tobias; Szardenings, Carsten

    2016-05-01

    Psychological tests are usually analysed with item response models. Recently, some alternative measurement models have been proposed that were derived from cognitive process models developed in experimental psychology. These models consider the responses but also the response times of the test takers. Two such models are the Q-diffusion model and the D-diffusion model. Both models can be calibrated with the diffIRT package of the R statistical environment via marginal maximum likelihood (MML) estimation. In this manuscript, an alternative approach to model calibration is proposed. The approach is based on weighted least squares estimation and parallels the standard estimation approach in structural equation modelling. Estimates are determined by minimizing the discrepancy between the observed and the implied covariance matrix. The estimator is simple to implement, consistent, and asymptotically normally distributed. Least squares estimation also provides a test of model fit by comparing the observed and implied covariance matrix. The estimator and the test of model fit are evaluated in a simulation study. Although parameter recovery is good, the estimator is less efficient than the MML estimator.

  14. The dimensionality of the Edinburgh Handedness Inventory: An analysis with models of the item response theory.

    Science.gov (United States)

    Büsch, Dirk; Hagemann, Norbert; Bender, Nils

    2010-11-01

    Handedness is frequently measured with sum scores or quotients taken from laterality questionnaires like the Edinburgh Handedness Inventory (EHI). In classical test theory such data cannot be used to confirm either the unidimensionality (i.e., quantitative differentiation with the poles left-handed and right-handed) or multidimensionality (i.e., typological differentiation between left-, right-, and mixed-handers) of this personal characteristic. This study uses item response theory models to test the construct validity of the EHI on an item level in order to gather empirical support for the differentiation of handedness as well as the appropriateness of the items and the response format. The EHI was given to 540 participants (303 male and 237 female) aged 17-37 years. Results of mixed-Rasch analyses revealed that the best model was a two-class solution; that is, left- and right-handers (types) with quantitative differences between persons. Hence, unlike earlier model tests, this rejects both the unidimensionality of the handedness construct and the need to consider so-called mixed-handers. It is proposed that mixed-Rasch analyses should be applied more frequently to test the construct validity of other as well as more extensive handedness questionnaires.

  15. Gender differences in posttraumatic stress symptoms among OEF/OIF veterans: an item response theory analysis.

    Science.gov (United States)

    King, Matthew W; Street, Amy E; Gradus, Jaimie L; Vogt, Dawne S; Resick, Patricia A

    2013-04-01

    Establishing whether men and women tend to express different symptoms of posttraumatic stress in reaction to trauma is important for both etiological research and the design of assessment instruments. Use of item response theory (IRT) can reveal how symptom reporting varies by gender and help determine if estimates of symptom severity for men and women are equally reliable. We analyzed responses to the PTSD Checklist (PCL) from 2,341 U.S. military veterans (51% female) who completed deployments in support of operations in Afghanistan and Iraq (Operation Enduring Freedom/Operation Iraqi Freedom [OEF/OIF]), and tested for differential item functioning by gender with an IRT-based approach. Among men and women with the same overall posttraumatic stress severity, women tended to report more frequent concentration difficulties and distress from reminders whereas men tended to report more frequent nightmares, emotional numbing, and hypervigilance. These item-level gender differences were small (on average d = 0.05), however, and had little impact on PCL measurement precision or expected total scores. For practical purposes, men's and women's severity estimates had similar reliability. This provides evidence that men and women veterans demonstrate largely similar profiles of posttraumatic stress symptoms following exposure to military-related stressors, and some theoretical perspectives suggest this may hold in other traumatized populations.

  16. A New Item Response Theory Model for Open-Ended Online Homework with Multiple Allowed Attempts

    CERN Document Server

    Gönülateş, Emre

    2015-01-01

    Item Response Theory (IRT) was originally developed in traditional exam settings, and it has been shown that the model does not readily transfer to formative assessment in the form of online homework. We investigate if this is mostly due to learner traits that do not become apparent in exam settings, namely random guessing due to lack of diligence or dedication, and copying work from other students or resources. Both of these traits mask the true ability of the learner, which is the only trait considered in most mainstream unidimensional IRT models. We find that indeed the introduction of these traits allows to better assess the true ability of the learners, as well as to better gauge the quality of assessment items. Correspondence of the model traits to self-reported behavior is investigated and confirmed. We find that of these two traits, copying answers has a larger influence on initial homework attempts than random guessing.

  17. Factor and item response theory analysis of the Protean and Boundaryless Career Attitude Scales

    Directory of Open Access Journals (Sweden)

    Gideon P. de Bruin

    2010-12-01

    Full Text Available Orientation: The concepts of the Protean Career and the Boundaryless Career show potential as frameworks for research and practice in the contemporary world of work. Briscoe, Hall and DeMuth (2006 developed the Protean and Boundaryless Career Attitude Scales, which consist of the Self-Directed Career Management, Values Driven, Boundaryless Mindset and Mobility Preference subscales. However, the standardisation and replication studies conducted by Briscoe et al., left some questions unanswered in terms of the psychometric properties of the subscales.Research purpose: This study examines the psychometric properties of the Protean and Boundaryless Career Attitude Scales with the aim of clarifying the structure of the scales, examining the quality of the items and evaluating the measurement precision of the scales.Research design, approach and method: Responses of adults to the items of the Protean and Boundaryless Career Attitude Scales were analysed with factor analytic and Rasch item response model techniques.Main findings: Factor and Rasch analyses revealed that three of the four postulated dimensions were replicated, but the Values Driven dimension split into two factors. Misfitting items were identified and sources of their misfit were uncovered. The Rasch analysis showed that three of the four subscales provide most of their psychometric information at the lower ends of their respective latent traits (where relatively few persons are located. Hence, the trait estimates of persons with low scores are more precise than those of persons with high scores.Practical/managerial implications: Overall, the quality of the Protean and Boundaryless Career Attitude Scales is satisfactory, but some aspects that may be improved are identified. Researchers may use at least three of the four subscales with confidence, but more work is possibly needed on the Values Driven subscale.Contribution/value-add: The study provides researchers with information on the

  18. Effect of reducing the number of items of the Oral Health Impact Profile on responsiveness, validity and reliability in edentulous populations.

    Science.gov (United States)

    Awad, Manal; Al-Shamrany, Muneera; Locker, David; Allen, Finbarr; Feine, Jocelyne

    2008-02-01

    The 49-item Oral Health Impact Profile (OHIP) has shown strong responsiveness, reliability and validity. However, the large number of items included may limit its use in clinical trials, clinical practice and surveys. The main objective of this study is to assess the effect of reducing the number of items in each domain, one at a time, on responsiveness, reliability and validity of the OHIP in edentulous populations. Data used in this study were obtained from two randomized clinical trials comparing mandibular implant overdentures and conventional dentures among 102 subjects between 35 and 65 years of age, and 60 subjects over the age of 65 years. Participants were edentulous individuals who wished to replace their current prostheses. Subjects in both trials were asked to complete the 49-item OHIP prior to treatment and at 2 months post-treatment. Within the study, effect sizes were computed at each stage of item reduction using the impact method. Intraclass correlation coefficients and Pearson's correlation coefficients were also assessed at each stage of item reduction. In addition, receiver-operating characteristic (ROC) curves were used to indicate the accuracy with which measurement changes corresponded to judgements of important changes in Oral Health Related Quality of Life (OHRQL). The results indicated that, in general, domain responsiveness was not affected by the reduction of the number of items used per domain. However, there was a decrease in reliability, especially within the 'psychological' and 'social' disabilities and 'handicap' domains (35- to 65-year group). In addition, there was a decrease in construct validity of the 'physical pain', 'psychological' and 'social disabilities' domains (35- to 65-year group), as well as on 'physical pain', 'psychological discomfort', 'physical' and 'psychological' disabilities in the 65-year and older group. This occurred primarily, when reducing from two to one item per domain. Among the 35- to 65-year group

  19. Psychometric Examination of an Inventory of Self-Efficacy for the Holland Vocational Themes Using Item Response Theory

    Science.gov (United States)

    Turner, Brandon M.; Betz, Nancy E.; Edwards, Michael C.; Borgen, Fred H.

    2010-01-01

    The psychometric properties of measures of self-efficacy for the six themes of Holland's theory were examined using item response theory. Item and scale quality were compared across levels of the trait continuum; all the scales were highly reliable but differentiated better at some levels of the continuum than others. Applications for adaptive…

  20. Using Cochran's Z Statistic to Test the Kernel-Smoothed Item Response Function Differences between Focal and Reference Groups

    Science.gov (United States)

    Zheng, Yinggan; Gierl, Mark J.; Cui, Ying

    2010-01-01

    This study combined the kernel smoothing procedure and a nonparametric differential item functioning statistic--Cochran's Z--to statistically test the difference between the kernel-smoothed item response functions for reference and focal groups. Simulation studies were conducted to investigate the Type I error and power of the proposed…

  1. Estimation of a Ramsay-Curve Item Response Theory Model by the Metropolis-Hastings Robbins-Monro Algorithm

    Science.gov (United States)

    Monroe, Scott; Cai, Li

    2014-01-01

    In Ramsay curve item response theory (RC-IRT) modeling, the shape of the latent trait distribution is estimated simultaneously with the item parameters. In its original implementation, RC-IRT is estimated via Bock and Aitkin's EM algorithm, which yields maximum marginal likelihood estimates. This method, however, does not produce the…

  2. Estimation of a Ramsay-Curve Item Response Theory Model by the Metropolis-Hastings Robbins-Monro Algorithm

    Science.gov (United States)

    Monroe, Scott; Cai, Li

    2014-01-01

    In Ramsay curve item response theory (RC-IRT) modeling, the shape of the latent trait distribution is estimated simultaneously with the item parameters. In its original implementation, RC-IRT is estimated via Bock and Aitkin's EM algorithm, which yields maximum marginal likelihood estimates. This method, however, does not produce the…

  3. Psychometric Examination of an Inventory of Self-Efficacy for the Holland Vocational Themes Using Item Response Theory

    Science.gov (United States)

    Turner, Brandon M.; Betz, Nancy E.; Edwards, Michael C.; Borgen, Fred H.

    2010-01-01

    The psychometric properties of measures of self-efficacy for the six themes of Holland's theory were examined using item response theory. Item and scale quality were compared across levels of the trait continuum; all the scales were highly reliable but differentiated better at some levels of the continuum than others. Applications for adaptive…

  4. Racial/ethnic differences in responses to the everyday discrimination scale: a differential item functioning analysis.

    Science.gov (United States)

    Lewis, Tené T; Yang, Frances M; Jacobs, Elizabeth A; Fitchett, George

    2012-03-01

    The authors examined the impact of race/ethnicity on responses to the Everyday Discrimination Scale, one of the most widely used discrimination scales in epidemiologic and public health research. Participants were 3,295 middle-aged US women (African-American, Caucasian, Chinese, Hispanic, and Japanese) from the Study of Women's Health Across the Nation (SWAN) baseline examination (1996-1997). Multiple-indicator, multiple-cause models were used to examine differential item functioning (DIF) on the Everyday Discrimination Scale by race/ethnicity. After adjustment for age, education, and language of interview, meaningful DIF was observed for 3 (out of 10) items: "receiving poorer service in restaurants or stores," "being treated as if you are dishonest," and "being treated with less courtesy than other people" (all P's discrimination differed slightly for women of different racial/ethnic groups, with certain "public" experiences appearing to have more salience for African-American and Chinese women and "dishonesty" having more salience for racial/ethnic minority women overall. "Courtesy" appeared to have more salience for Hispanic women only in comparison with African-American women. Findings suggest that the Everyday Discrimination Scale could potentially be used across racial/ethnic groups as originally intended. However, researchers should use caution with items that demonstrated DIF.

  5. What is the Ability Emotional Intelligence Test (MSCEIT good for? An evaluation using item response theory.

    Directory of Open Access Journals (Sweden)

    Marina Fiori

    Full Text Available The ability approach has been indicated as promising for advancing research in emotional intelligence (EI. However, there is scarcity of tests measuring EI as a form of intelligence. The Mayer Salovey Caruso Emotional Intelligence Test, or MSCEIT, is among the few available and the most widespread measure of EI as an ability. This implies that conclusions about the value of EI as a meaningful construct and about its utility in predicting various outcomes mainly rely on the properties of this test. We tested whether individuals who have the highest probability of choosing the most correct response on any item of the test are also those who have the strongest EI ability. Results showed that this is not the case for most items: The answer indicated by experts as the most correct in several cases was not associated with the highest ability; furthermore, items appeared too easy to challenge individuals high in EI. Overall results suggest that the MSCEIT is best suited to discriminate persons at the low end of the trait. Results are discussed in light of applied and theoretical considerations.

  6. What is the Ability Emotional Intelligence Test (MSCEIT) good for? An evaluation using item response theory.

    Science.gov (United States)

    Fiori, Marina; Antonietti, Jean-Philippe; Mikolajczak, Moira; Luminet, Olivier; Hansenne, Michel; Rossier, Jérôme

    2014-01-01

    The ability approach has been indicated as promising for advancing research in emotional intelligence (EI). However, there is scarcity of tests measuring EI as a form of intelligence. The Mayer Salovey Caruso Emotional Intelligence Test, or MSCEIT, is among the few available and the most widespread measure of EI as an ability. This implies that conclusions about the value of EI as a meaningful construct and about its utility in predicting various outcomes mainly rely on the properties of this test. We tested whether individuals who have the highest probability of choosing the most correct response on any item of the test are also those who have the strongest EI ability. Results showed that this is not the case for most items: The answer indicated by experts as the most correct in several cases was not associated with the highest ability; furthermore, items appeared too easy to challenge individuals high in EI. Overall results suggest that the MSCEIT is best suited to discriminate persons at the low end of the trait. Results are discussed in light of applied and theoretical considerations.

  7. Social Support Scale (MOS-SSS: Analysis of the Psychometric Properties via Item Response Theory

    Directory of Open Access Journals (Sweden)

    Daniela Sacramento Zanini

    Full Text Available Abstract The study on social relationships that influence health, as well as the development of reliable measures to assess this construct has been highlighted in the academic literature. The aim of this study was to estimate new evidence of validity based on the internal structure and reliability of the MOS-SSS, as well as the parameters of items and participants by Item response theory. The sample consisted of 998 people (age: M = 27.18, SD = 9.90, 65.1% women from different sampling strata. Confirmatory factor analysis (CFA revealed better goodness of fit of the four-factor model when compared to factor structures shown in other Brazilian studies. The multigroup CFA demonstrated invariance of the factor model when comparing the different sampling strata. The partial credit model indicated items with mean difficulty and appropriate adjustments indices (infit/outfit and desirable reliability for the factors. The analysis of the maps indicated the tool's strengths and limitations to assess the construct.

  8. A 6-item scale for overall, emotional and social loneliness: confirmatory tests on survey data

    NARCIS (Netherlands)

    de Jong Gierveld, J.; van Tilburg, T.

    2006-01-01

    Loneliness is an indicator of social well-being and pertains to the feeling of missing an intimate relationship (emotional loneliness) or missing a wider social network (social loneliness). The 11-item De Jong Gierveld Loneliness Scale has proved to be a valid and reliable measurement instrument for

  9. Survey of Munitions Response Technologies

    Science.gov (United States)

    2006-06-01

    distributed between two operators and tied with an umbilical cord. Man-portable platforms are also being developed using wireless technology to reduce the...munitions response (Lim 2004, Bucaro 2006, Lavely 2006, Carroll 2006). Models are being validated using data measured in tanks and ponds and in offshore

  10. Evaluation of a skin self examination attitude scale using an item response theory model approach

    OpenAIRE

    Djaja, Ngadiman; Youl, Pip; Aitken, Joanne; Janda, Monika

    2014-01-01

    Introduction The Skin Self-Examination Attitude Scale (SSEAS) is a brief measure that allows for the assessment of attitudes in relation to skin self-examination. This study evaluated the psychometric properties of the SSEAS using Item Response Theory (IRT) methods in a large sample of men???50 years in Queensland, Australia. Methods A sample of 831 men (420 intervention and 411 control) completed a telephone assessment at the 13-month follow-up of a randomized-controlled trial of a video-bas...

  11. Measuring Consumers’ Environmental Responsibility: A Synthesis of Constructs and Measurement Scale Items

    Directory of Open Access Journals (Sweden)

    K. M. R. Taufique

    2014-04-01

    Full Text Available It is universal that central to all production is consumption. Without proper management, production along with consumption is likely to be the main sources of environmental problems. This very reality calls for consumers to be environmentally responsible in their consumption behavior. The objective of this paper is to prepare a synthesis of all the possible factors and measurement scale items to be used for assessing consumers’ environmental responsibility. For making such synthesis, all major works done on the field have been thoroughly reviewed.The paper comes up with a total of six parameters that include knowledge & awareness, attitude, green consumer value, emotional affinity toward nature, willingness to act and environment related past behavior. These tentative, yet inclusive set of parameters are thought to be useful for guiding the designing of large scale future empirical researches for developing a dependable inclusive set of parameters to test consumer’ environmental responsibility. A conceptual model and possible measurement items are proposed for further empirical research.

  12. Which person variables predict how people benefit from True-False over Constructed Response items?

    Directory of Open Access Journals (Sweden)

    Stella Bollmann

    2015-06-01

    Full Text Available The aim of this study was the investigation of the variable Benefit from TF, which we assumed to be additionally measured when using True-False instead of Constructed Response tests. Subjects who benefit from True-False have an advantage over other subjects in answering Multiple Choice or True-False exams. We expected it to be related to partial knowledge and examined its relation to other personal abilities and traits in a total of n = 106 psychology students. They completed a statistics exam in Constructed Response and True-False format and benefit items were defined as those to which the associated constructed response answer was not correct. Additionally, verbal intelligence and Big 5 measures were obtained. Results confirm the existence of the person variable Benefit from TF and its relation to partial knowledge. Furthermore, benefiters differed from others in conscientiousness and openness to experience variables. However, contrary to expectations, they did not differ in verbal IQ.

  13. Self efficacy for fruit, vegetable and water intakes: Expanded and abbreviated scales from item response modeling analyses

    Directory of Open Access Journals (Sweden)

    Cullen Karen W

    2010-03-01

    Full Text Available Abstract Objective To improve an existing measure of fruit and vegetable intake self efficacy by including items that varied on levels of difficulty, and testing a corresponding measure of water intake self efficacy. Design Cross sectional assessment. Items were modified to have easy, moderate and difficult levels of self efficacy. Classical test theory and item response modeling were applied. Setting One middle school at each of seven participating sites (Houston TX, Irvine CA, Philadelphia PA, Pittsburg PA, Portland OR, rural NC, and San Antonio TX. Subjects 714 6th grade students. Results Adding items to reflect level (low, medium, high of self efficacy for fruit and vegetable intake achieved scale reliability and validity comparable to existing scales, but the distribution of items across the latent variable did not improve. Selecting items from among clusters of items at similar levels of difficulty along the latent variable resulted in an abbreviated scale with psychometric characteristics comparable to the full scale, except for reliability. Conclusions The abbreviated scale can reduce participant burden. Additional research is necessary to generate items that better distribute across the latent variable. Additional items may need to tap confidence in overcoming more diverse barriers to dietary intake.

  14. Content Validity and Psychometric Characteristics of the "Knowledge about Older Patients Quiz" for Nurses Using Item Response Theory.

    Science.gov (United States)

    Dikken, Jeroen; Hoogerduijn, Jita G; Kruitwagen, Cas; Schuurmans, Marieke J

    2016-11-01

    To assess the content validity and psychometric characteristics of the Knowledge about Older Patients Quiz (KOP-Q), which measures nurses' knowledge regarding older hospitalized adults and their certainty regarding this knowledge. Cross-sectional. Content validity: general hospitals. Psychometric characteristics: nursing school and general hospitals in the Netherlands. Content validity: 12 nurse specialists in geriatrics. Psychometric characteristics: 107 first-year and 78 final-year bachelor of nursing students, 148 registered nurses, and 20 nurse specialists in geriatrics. Content validity: The nurse specialists rated each item of the initial KOP-Q (52 items) on relevance. Ratings were used to calculate Item-Content Validity Index and average Scale-Content Validity Index (S-CVI/ave) scores. Items with insufficient content validity were removed. Psychometric characteristics: Ratings of students, nurses, and nurse specialists were used to test for different item functioning (DIF) and unidimensionality before item characteristics (discrimination and difficulty) were examined using Item Response Theory. Finally, norm references were calculated and nomological validity was assessed. Content validity: Forty-three items remained after assessing content validity (S-CVI/ave = 0.90). Psychometric characteristics: Of the 43 items, two demonstrating ceiling effects and 11 distorting ability estimates (DIF) were subsequently excluded. Item characteristics were assessed for the remaining 30 items, all of which demonstrated good discrimination and difficulty parameters. Knowledge was positively correlated with certainty about this knowledge. The final 30-item KOP-Q is a valid, psychometrically sound, comprehensive instrument that can be used to assess the knowledge of nursing students, hospital nurses, and nurse specialists in geriatrics regarding older hospitalized adults. It can identify knowledge and certainty deficits for research purposes or serve as a tool in educational

  15. A Psychometric Analysis of the Italian Version of the eHealth Literacy Scale Using Item Response and Classical Test Theory Methods.

    Science.gov (United States)

    Diviani, Nicola; Dima, Alexandra Lelia; Schulz, Peter Johannes

    2017-04-11

    The eHealth Literacy Scale (eHEALS) is a tool to assess consumers' comfort and skills in using information technologies for health. Although evidence exists of reliability and construct validity of the scale, less agreement exists on structural validity. The aim of this study was to validate the Italian version of the eHealth Literacy Scale (I-eHEALS) in a community sample with a focus on its structural validity, by applying psychometric techniques that account for item difficulty. Two Web-based surveys were conducted among a total of 296 people living in the Italian-speaking region of Switzerland (Ticino). After examining the latent variables underlying the observed variables of the Italian scale via principal component analysis (PCA), fit indices for two alternative models were calculated using confirmatory factor analysis (CFA). The scale structure was examined via parametric and nonparametric item response theory (IRT) analyses accounting for differences between items regarding the proportion of answers indicating high ability. Convergent validity was assessed by correlations with theoretically related constructs. CFA showed a suboptimal model fit for both models. IRT analyses confirmed all items measure a single dimension as intended. Reliability and construct validity of the final scale were also confirmed. The contrasting results of factor analysis (FA) and IRT analyses highlight the importance of considering differences in item difficulty when examining health literacy scales. The findings support the reliability and validity of the translated scale and its use for assessing Italian-speaking consumers' eHealth literacy.

  16. An item response theory analysis of self-report measures of adult attachment.

    Science.gov (United States)

    Fraley, R C; Waller, N G; Brennan, K A

    2000-02-01

    Self-report measures of adult attachment are typically scored in ways (e.g., averaging or summing items) that can lead to erroneous inferences about important theoretical issues, such as the degree of continuity in attachment security and the differential stability of insecure attachment patterns. To determine whether existing attachment scales suffer from scaling problems, the authors conducted an item response theory (IRT) analysis of 4 commonly used self-report inventories: Experiences in Close Relationships scales (K. A. Brennan, C. L. Clark, & P. R. Shaver, 1998), Adult Attachment Scales (N. L. Collins & S. J. Read, 1990), Relationship Styles Questionnaire (D. W. Griffin & K. Bartholomew, 1994) and J. Simpson's (1990) attachment scales. Data from 1,085 individuals were analyzed using F. Samejima's (1969) graded response model. The authors' findings indicate that commonly used attachment scales can be improved in a number of important ways. Accordingly, the authors show how IRT techniques can be used to develop new attachment scales with desirable psychometric properties.

  17. Incorporating Mobility in Growth Modeling for Multilevel and Longitudinal Item Response Data.

    Science.gov (United States)

    Choi, In-Hee; Wilson, Mark

    2016-01-01

    Multilevel data often cannot be represented by the strict form of hierarchy typically assumed in multilevel modeling. A common example is the case in which subjects change their group membership in longitudinal studies (e.g., students transfer schools; employees transition between different departments). In this study, cross-classified and multiple membership models for multilevel and longitudinal item response data (CCMM-MLIRD) are developed to incorporate such mobility, focusing on students' school change in large-scale longitudinal studies. Furthermore, we investigate the effect of incorrectly modeling school membership in the analysis of multilevel and longitudinal item response data. Two types of school mobility are described, and corresponding models are specified. Results of the simulation studies suggested that appropriate modeling of the two types of school mobility using the CCMM-MLIRD yielded good recovery of the parameters and improvement over models that did not incorporate mobility properly. In addition, the consequences of incorrectly modeling the school effects on the variance estimates of the random effects and the standard errors of the fixed effects depended upon mobility patterns and model specifications. Two sets of large-scale longitudinal data are analyzed to illustrate applications of the CCMM-MLIRD for each type of school mobility.

  18. Bayesian Modal Estimation of the Four-Parameter Item Response Model in Real, Realistic, and Idealized Data Sets.

    Science.gov (United States)

    Waller, Niels G; Feuerstahler, Leah

    2017-03-17

    In this study, we explored item and person parameter recovery of the four-parameter model (4PM) in over 24,000 real, realistic, and idealized data sets. In the first analyses, we fit the 4PM and three alternative models to data from three Minnesota Multiphasic Personality Inventory-Adolescent form factor scales using Bayesian modal estimation (BME). Our results indicated that the 4PM fits these scales better than simpler item Response Theory (IRT) models. Next, using the parameter estimates from these real data analyses, we estimated 4PM item parameters in 6,000 realistic data sets to establish minimum sample size requirements for accurate item and person parameter recovery. Using a factorial design that crossed discrete levels of item parameters, sample size, and test length, we also fit the 4PM to an additional 18,000 idealized data sets to extend our parameter recovery findings. Our combined results demonstrated that 4PM item parameters and parameter functions (e.g., item response functions) can be accurately estimated using BME in moderate to large samples (N ⩾ 5, 000) and person parameters can be accurately estimated in smaller samples (N ⩾ 1, 000). In the supplemental files, we report annotated [Formula: see text] code that shows how to estimate 4PM item and person parameters in [Formula: see text] (Chalmers, 2012 ).

  19. Mathematical literacy examination items and student errors: An analysis of English Second Language students’ responses

    Directory of Open Access Journals (Sweden)

    Pamela Vale

    2013-04-01

    Full Text Available Mathematical literacy is a real-world practical attribute yet students write a high-stakes examination in order to pass the subject Mathematical Literacy in the National Certificates (Vocational (NC(V. In these examinations, all sources of information are contextualised in language. It can be effortful for English second language students to decode text. The deliberate processing that is required saturates working memory and prevents these students from optimally engaging in problem solving. In this study, 15 items from an NC(V Level 4 Mathematical Literacy examination are selected, as well as 15 student responses to each of these questions. From these responses, those which are incorrect are analysed to determine whether the error is due to insufficient mathematical literacy or a lack of English language proficiency. These results are used as an indication as to whether the examination is fair and valid for this group of students.

  20. Student Ratings of the Importance of Survey Items, Multiplicative Factor Analysis, and the Validity of the Community of Inquiry Survey

    Science.gov (United States)

    Diaz, Sebastian R.; Swan, Karen; Ice, Philip; Kupczynski, Lori

    2010-01-01

    This research builds upon prior validation studies of the Community of Inquiry (CoI) survey by utilizing multiple rating measures to validate the survey's tripartite structure (teaching presence, social presence, and cognitive presence). In prior studies exploring the construct validity of these 3 subscales, only respondents' course ratings were…

  1. Quantifying diagnostic uncertainty using item response theory: the Posterior Probability of Diagnosis Index.

    Science.gov (United States)

    Lindhiem, Oliver; Kolko, David J; Yu, Lan

    2013-06-01

    Using traditional Diagnostic and Statistical Manual of Mental Disorders, fourth edition, text revision (American Psychiatric Association, 2000) diagnostic criteria, clinicians are forced to make categorical decisions (diagnosis vs. no diagnosis). This forced choice implies that mental and behavioral health disorders are categorical and does not fully characterize varying degrees of uncertainty associated with a particular diagnosis. Using an item response theory (latent trait model) framework, we describe the development of the Posterior Probability of Diagnosis (PPOD) Index, which answers the question: What is the likelihood that a patient meets or exceeds the latent trait threshold for a diagnosis? The PPOD Index is based on the posterior distribution of θ (latent trait score) for each patient's profile of symptoms. The PPOD Index allows clinicians to quantify and communicate the degree of uncertainty associated with each diagnosis in probabilistic terms. We illustrate the advantages of the PPOD Index in a clinical sample (N = 321) of children and adolescents with oppositional defiant disorder.

  2. Use and Misuse of the Likert Item Responses and Other Ordinal Measures.

    Science.gov (United States)

    Bishop, Phillip A; Herron, Robert L

    Likert, Likert-type, and ordinal-scale responses are very popular psychometric item scoring schemes for attempting to quantify people's opinions, interests, or perceived efficacy of an intervention and are used extensively in Physical Education and Exercise Science research. However, these numbered measures are generally considered ordinal and violate some statistical assumptions needed to evaluate them as normally distributed, parametric data. This is an issue because parametric statistics are generally perceived as being more statistically powerful than non-parametric statistics. To avoid possible misinterpretation, care must be taken in analyzing these types of data. The use of visual analog scales may be equally efficacious and provide somewhat better data for analysis with parametric statistics.

  3. Capturing abnormal personality with normal personality inventories: an item response theory approach.

    Science.gov (United States)

    Walton, Kate E; Roberts, Brent W; Krueger, Robert F; Blonigen, Daniel M; Hicks, Brian M

    2008-12-01

    Correlational and factor-analytic methods indicate that abnormal and normal personality constructs may be tapping the same underlying latent trait. However, they do not systematically demonstrate that measures of abnormal personality capture more extreme ranges of the latent trait than measures of normal range personality. Item Response Theory (IRT) methods, in contrast, do provide this information. In the present study, we use IRT methods to evaluate the range of the latent trait assessed with a normal personality measure and a measure of psychopathy as one example of an abnormal personality construct. Contrary to the expectation that the measure of psychopathy would be more extreme than the measure of normal personality traits, the measures overlapped substantially in terms of the regions of the latent trait for which they provide information. Moreover, both types of inventories were limited in terms of measurement bandwidth, such that they did not provide information across the entire latent trait continuum. Implications and future directions are discussed.

  4. Evaluation of Buss-Perry aggression Questionnaire with item response theory (IRT

    Directory of Open Access Journals (Sweden)

    Dinić Bojana

    2012-01-01

    Full Text Available The aim of this research was to examine the psychometric properties of the Buss-Perry Aggression Questionnaire on Serbian sample, using the IRT model for graded responses. AQ contains four subscales: Physical aggression, Verbal aggression, Hostility and Anger. The sample included 1272 participants, both gender and age ranged from 18 to 68 years, with average age of 31.39 (SD = 12.63 years. Results of IRT analysis suggested that the subscales had greater information in the range of above-average scores, namely in participants with higher level of aggressiveness. The exception was Hostilisty subscale, because it was informative in the wider range of trait. On the other hand, this subscale contains two items which violate assumption of homogenity. Implications for measurement of aggressiveness are discussed.

  5. Multi-Sensory Cognitive Learning as Facilitated in a Multimedia Tutorial for Item Response Theory

    Directory of Open Access Journals (Sweden)

    Chong Ho Yu

    2007-08-01

    Full Text Available The objective of this paper is to introduce an application of multi-sensory cognitive learning theory into the development of a multimedia tutorial for Item Response Theory. The cognitive multimedia theory suggests that the visual and auditory material should be presented simultaneously to reinforce the retention of learned materials. A computer-assisted module is carefully designed based upon the preceding theory and also an experiment was conducted to examine the effect of audio types (human audio, computer audio, and no audio on learner performance measured by an objective test. It was found that while there is no significant performance gap between the human audio and the no audio group, the two groups substantively outperform the computer audio group. A plausible explanation is that un-natural audio requires additional cognitive power to process the information and thus this distraction affects the performance.

  6. Construction of a memory battery for computerized administration, using item response theory.

    Science.gov (United States)

    Ferreira, Aristides I; Almeida, Leandro S; Prieto, Gerardo

    2012-10-01

    In accordance with Item Response Theory, a computer memory battery with six tests was constructed for use in the Portuguese adult population. A factor analysis was conducted to assess the internal structure of the tests (N = 547 undergraduate students). According to the literature, several confirmatory factor models were evaluated. Results showed better fit of a model with two independent latent variables corresponding to verbal and non-verbal factors, reproducing the initial battery organization. Internal consistency reliability for the six tests were alpha = .72 to .89. IRT analyses (Rasch and partial credit models) yielded good Infit and Outfit measures and high precision for parameter estimation. The potential utility of these memory tasks for psychological research and practice willbe discussed.

  7. Dimensionality Assessment Using the Full-Information Item Bifactor Analysis for Graded Response Data: An Illustration with the State Metacognitive Inventory

    Science.gov (United States)

    Immekus, Jason C.; Imbrie, P. K.

    2008-01-01

    Dimensionality assessment using the full-information item bifactor model for graded response data is provided. The model applies to data in which each item relates to a general factor and one group factor. Specifically, alternative model specification within item response theory (IRT) is shown to test a scale's factor structure. For illustrative…

  8. Making Meaningful Measurement in Survey Research: The Use of Person and Item Maps

    Science.gov (United States)

    Royal, Kenneth D.

    2009-01-01

    Quality measurement is essential in every form of research, including institutional research and assessment. Unfortunately, most survey research today (both published and unpublished) is lacking with regards to quality measurement. Reporting means and standard deviations based on ordinal measures is an inappropriate, yet widespread practice in the…

  9. Teoria da resposta ao item aplicada ao Inventário de Depressão Beck Item response theory applied to the Beck Depression Inventory

    Directory of Open Access Journals (Sweden)

    Stela Maris de Jezus Castro

    2010-09-01

    Full Text Available O Inventário de Depressão Beck (BDI, uma escala que mede o traço latente de intensidade de sintomas depressivos, pode ser avaliado através da Teoria da Resposta ao Item (TRI. Este estudo utilizou o modelo TRI de Resposta Gradual na avaliação da intensidade de sintomas depressivos de 4.025 indivíduos que responderam ao BDI, de modo a explorar eficientemente a informação disponível nos diferentes aspectos possibilitados pelo uso desta metodologia. O ajuste foi efetuado no software PARSCALE. Foram identificados 13 itens do BDI nos quais pelo menos uma categoria de resposta não tinha chance maior que as demais de ser escolhida, de modo que estes itens tiveram de ser recategorizados. Os itens com maior capacidade de discriminação são relativos à tristeza, pessimismo, sentimento de fracasso, insatisfação, auto-aversão, indecisão e dificuldade para trabalhar. Os itens mais graves são aqueles relacionados com perda de peso, retraimento social e idéias suicidas. O grupo dos 202 indivíduos com as maiores intensidades de sintomas depressivos foi composto por 74% de mulheres, e praticamente 84% possuíam diagnóstico de algum transtorno psiquiátrico. Os resultados evidenciam alguns dos inúmeros ganhos advindos da utilização da TRI na análise de traços latentes.The Beck Depression Inventory (BDI, a scale that measures the latent trait intensity of depression symptoms, can be assessed by the Item Response Theory (IRT. This study used the Graded-Response model (GRM to assess the intensity of depressive symptoms in 4,025 individuals who responded to the BDI, in order to efficiently use the information available on different aspects enabled by the use of this methodology. The fit of this model was done in PARSCALE software. We identified 13 items of the BDI in which at least one response category was not more likely than others to be chosen, so that these items had to be categorized again. The items with greater power of

  10. Stigma in Canada: Results From a Rapid Response Survey

    Science.gov (United States)

    Stuart, Heather; Patten, Scott B; Koller, Michelle; Modgill, Geeta; Liinamaa, Tiina

    2014-01-01

    Objective: Our paper presents findings from the first population survey of stigma in Canada using a new measure of stigma. Empirical objectives are to provide a descriptive profile of Canadian’s expectations that people will devalue and discriminate against someone with depression, and to explore the relation between experiences of being stigmatized in the year prior to the survey among people having been treated for a mental illness with a selected number of sociodemographic and mental health–related variables. Method: Data were collected by Statistics Canada using a rapid response format on a representative sample of Canadians (n = 10 389) during May and June of 2010. Public expectations of stigma and personal experiences of stigma in the subgroup receiving treatment for a mental illness were measured. Results: Over one-half of the sample endorsed 1 or more of the devaluation discrimination items, indicating that they believed Canadians would stigmatize someone with depression. The item most frequently endorsed concerned employers not considering an application from someone who has had depression. Over one-third of people who had received treatment in the year prior to the survey reported discrimination in 1 or more life domains. Experiences of discrimination were strongly associated with perceptions that Canadians would devalue someone with depression, younger age (12 to 15 years), and self-reported poor general mental health. Conclusions: The Mental Health Experiences Module reflects an important partnership between 2 national organizations that will help Canada fulfill its monitoring obligations under the United Nations Convention on the Rights of Persons with Disabilities and provide a legacy to researchers and policy-makers who are interested in monitoring changes in stigma over time. PMID:25565699

  11. Item Response Theory Analyses of the Parent and Teacher Ratings of the DSM-IV ADHD Rating Scale

    Science.gov (United States)

    Gomez, Rapson

    2008-01-01

    The graded response model (GRM), which is based on item response theory (IRT), was used to evaluate the psychometric properties of the inattention and hyperactivity/impulsivity symptoms in an ADHD rating scale. To accomplish this, parents and teachers completed the DSM-IV ADHD Rating Scale (DARS; Gomez et al., "Journal of Child Psychology and…

  12. Cultural Resources Survey of Three Iberville Parish Levee Enlargement and Revetment Construction Items

    Science.gov (United States)

    1993-09-22

    and four feet front, and forty arpents in depth, and bounded on one side by land of Bonaventura Leblanc, and on the other by Juan Hebert. It appears...and on the lower by land of Bonaventura Forest. This land was surveyed by Don Luis Andry, in the year 1772, in favor of the claimant, who obtained a...1772, In favor of Bonaventura Forest, who obtained a complete grant for the same In the year 1774, from Governor Unzaga; under which grant the

  13. An item response theory analysis of the Executive Interview and development of the EXIT8: A Project FRONTIER Study.

    Science.gov (United States)

    Jahn, Danielle R; Dressel, Jeffrey A; Gavett, Brandon E; O'Bryant, Sid E

    2015-01-01

    The Executive Interview (EXIT25) is an effective measure of executive dysfunction, but may be inefficient due to the time it takes to complete 25 interview-based items. The current study aimed to examine psychometric properties of the EXIT25, with a specific focus on determining whether a briefer version of the measure could comprehensively assess executive dysfunction. The current study applied a graded response model (a type of item response theory model for polytomous categorical data) to identify items that were most closely related to the underlying construct of executive functioning and best discriminated between varying levels of executive functioning. Participants were 660 adults ages 40 to 96 years living in West Texas, who were recruited through an ongoing epidemiological study of rural health and aging, called Project FRONTIER. The EXIT25 was the primary measure examined. Participants also completed the Trail Making Test and Controlled Oral Word Association Test, among other measures, to examine the convergent validity of a brief form of the EXIT25. Eight items were identified that provided the majority of the information about the underlying construct of executive functioning; total scores on these items were associated with total scores on other measures of executive functioning and were able to differentiate between cognitively healthy, mildly cognitively impaired, and demented participants. In addition, cutoff scores were recommended based on sensitivity and specificity of scores. A brief, eight-item version of the EXIT25 may be an effective and efficient screening for executive dysfunction among older adults.

  14. Maximising response rates in household telephone surveys

    Directory of Open Access Journals (Sweden)

    Sinclair Martha

    2008-11-01

    Full Text Available Abstract Background Epidemiological and other studies that require participants to respond by completing a questionnaire face the growing threat of non-response. Response rates to household telephone surveys are diminishing because of changes in telecommunications, marketing and culture. Accordingly, updated information is required about the rate of telephone listing in directories and optimal strategies to maximise survey participation. Methods A total of 3426 households in Sydney, Australia were approached to participate in a computer assisted telephone interview (CATI regarding their domestic (recycled and/or drinking water usage. Only randomly selected households in the suburb and postcode of interest with a telephone number listed in the Electronic White Pages (EWP that matched Australian electoral records were approached. Results The CATI response rate for eligible households contacted by telephone was 39%. The rate of matching of electoral and EWP records, a measure of telephone directory coverage, was 55%. Conclusion The use of a combination of approaches, such as an advance letter, interviewer training, establishment of researcher credentials, increasing call attempts and targeted call times, remains a good strategy to maximise telephone response rates. However, by way of preparation for future technological changes, reduced telephone number listings and people's increasing resistance to unwanted phone calls, alternatives to telephone surveys, such as internet-based approaches, should be investigated.

  15. Maximising response rates in household telephone surveys.

    Science.gov (United States)

    O'Toole, Joanne; Sinclair, Martha; Leder, Karin

    2008-11-03

    Epidemiological and other studies that require participants to respond by completing a questionnaire face the growing threat of non-response. Response rates to household telephone surveys are diminishing because of changes in telecommunications, marketing and culture. Accordingly, updated information is required about the rate of telephone listing in directories and optimal strategies to maximise survey participation. A total of 3426 households in Sydney, Australia were approached to participate in a computer assisted telephone interview (CATI) regarding their domestic (recycled and/or drinking) water usage. Only randomly selected households in the suburb and postcode of interest with a telephone number listed in the Electronic White Pages (EWP) that matched Australian electoral records were approached. The CATI response rate for eligible households contacted by telephone was 39%. The rate of matching of electoral and EWP records, a measure of telephone directory coverage, was 55%. The use of a combination of approaches, such as an advance letter, interviewer training, establishment of researcher credentials, increasing call attempts and targeted call times, remains a good strategy to maximise telephone response rates. However, by way of preparation for future technological changes, reduced telephone number listings and people's increasing resistance to unwanted phone calls, alternatives to telephone surveys, such as internet-based approaches, should be investigated.

  16. A psychometric analysis of the Trait Emotional Intelligence Questionnaire-Short Form (TEIQue-SF) using item response theory.

    Science.gov (United States)

    Cooper, Andrew; Petrides, K V

    2010-09-01

    Trait emotional intelligence refers to a constellation of emotional self-perceptions located at the lower levels of personality hierarchies. In 2 studies, we sought to examine the psychometric properties of the Trait Emotional Intelligence Questionnaire-Short Form (TEIQue-SF; Petrides, 2009) using item response theory (IRT). Study 1 (N= 1,119, 455 men) showed that most items had good discrimination and threshold parameters and high item information values. At the global level, the TEIQue-SF showed very good precision across most of the latent trait range. Study 2 (N= 866, 432 men) used similar IRT techniques in a new sample based on the latest version of the TEIQue-SF (version 1.50). Results replicated Study 1, with the instrument showing good psychometric properties at the item and global level. Overall, the 2 studies suggest the TEIQue-SF can be recommended when a rapid assessment of trait emotional intelligence is required.

  17. Psychometric Properties of the Brazilian 12-Item Short-Form Health Survey Version 2 (SF-12v2

    Directory of Open Access Journals (Sweden)

    Bruno Figueiredo Damásio

    2015-04-01

    Full Text Available The 12-Item Short-Form Health Survey, in its initial (SF-12 and revised form (SF-12v2 is a widely used measure to evaluate health-related quality of life (HRQoL. The present study evaluates the factor structure and reliability of the Brazilian version of the SF-12v2. Participants were 627 subjects (74.1% women, aged from 18 to 88 years (M = 38.6; SD = 13.16, from 17 Brazilian states. Confirmatory factor analyses suggested two pairs of error terms to be highly correlated (3a-3b; and 4a-4b. A qualitative inspection showed an overlap of content among these items. The respecified model presented adequate fit indices. Convergent validity was also tested with measures of health-related self-care, subjective happiness, life satisfaction, depression and self-efficacy. Expected correlations were found between the SF-12v2 and these measures. Results showed initial evidence in favor of using the SF-12v2 as a measure of physical and mental health in the Brazilian context.

  18. Detection and validation of unscalable item score patterns using item response theory: an illustration with Harter's Self-Perception Profile for Children.

    Science.gov (United States)

    Meijer, Rob R; Egberink, Iris J L; Emons, Wilco H M; Sijtsma, Klaas

    2008-05-01

    We illustrate the usefulness of person-fit methodology for personality assessment. For this purpose, we use person-fit methods from item response theory. First, we give a nontechnical introduction to existing person-fit statistics. Second, we analyze data from Harter's (1985) Self-Perception Profile for Children (Harter, 1985) in a sample of children ranging from 8 to 12 years of age (N = 611) and argue that for some children, the scale scores should be interpreted with care and caution. Combined information from person-fit indexes and from observation, interviews, and self-concept theory showed that similar score profiles may have a different interpretation. For some children in the sample, item scores did not adequately reflect their trait level. Based on teacher interviews, this was found to be due most likely to a less developed self-concept and/or problems understanding the meaning of the questions. We recommend investigating the scalability of score patterns when using self-report inventories to help the researcher interpret respondents' behavior correctly.

  19. A survey of anatomical items relevant to the practice of rheumatology: upper extremity, head, neck, spine, and general concepts.

    Science.gov (United States)

    Villaseñor-Ovies, Pablo; Navarro-Zarza, José Eduardo; Saavedra, Miguel Ángel; Hernández-Díaz, Cristina; Canoso, Juan J; Biundo, Joseph J; Kalish, Robert A; de Toro Santos, Francisco Javier; McGonagle, Dennis; Carette, Simon; Alvarez-Nemegyei, José

    2016-12-01

    This study aimed to identify the anatomical items of the upper extremity and spine that are potentially relevant to the practice of rheumatology. Ten rheumatologists interested in clinical anatomy who published, taught, and/or participated as active members of Clinical Anatomy Interest groups (six seniors, four juniors), participated in a one-round relevance Delphi exercise. An initial, 560-item list that included 45 (8.0 %) general concepts items; 138 (24.8 %) hand items; 100 (17.8 %) forearm and elbow items; 147 (26.2 %) shoulder items; and 130 (23.2 %) head, neck, and spine items was compiled by 5 of the participants. Each item was graded for importance with a Likert scale from 1 (not important) to 5 (very important). Thus, scores could range from 10 (1 × 10) to 50 (5 × 10). An item score of ≥40 was considered most relevant to competent practice as a rheumatologist. Mean item Likert scores ranged from 2.2 ± 0.5 to 4.6 ± 0.7. A total of 115 (20.5 %) of the 560 initial items reached relevance. Broken down by categories, this final relevant item list was composed by 7 (6.1 %) general concepts items; 32 (27.8 %) hand items; 20 (17.4 %) forearm and elbow items; 33 (28.7 %) shoulder items; and 23 (17.6 %) head, neck, and spine items. In this Delphi exercise, a group of practicing academic rheumatologists with an interest in clinical anatomy compiled a list of anatomical items that were deemed important to the practice of rheumatology. We suggest these items be considered curricular priorities when training rheumatology fellows in clinical anatomy skills and in programs of continuing rheumatology education.

  20. Bifactor and Item Response Theory Analyses of Interviewer Report Scales of Cognitive Impairment in Schizophrenia

    Science.gov (United States)

    Reise, Steven P.; Ventura, Joseph; Keefe, Richard S. E.; Baade, Lyle E.; Gold, James M.; Green, Michael F.; Kern, Robert S.; Mesholam-Gately, Raquelle; Nuechterlein, Keith H.; Seidman, Larry J.; Bilder, Robert

    2011-01-01

    A psychometric analysis of 2 interview-based measures of cognitive deficits was conducted: the 21-item Clinical Global Impression of Cognition in Schizophrenia (CGI-CogS; Ventura et al., 2008), and the 20-item Schizophrenia Cognition Rating Scale (SCoRS; Keefe et al., 2006), which were administered on 2 occasions to a sample of people with…

  1. Further Simplification of the Simple Erosion Narrowing Score With Item Response Theory Methodology.

    Science.gov (United States)

    Oude Voshaar, Martijn A H; Schenk, Olga; Ten Klooster, Peter M; Vonkeman, Harald E; Bernelot Moens, Hein J; Boers, Maarten; van de Laar, Mart A F J

    2016-08-01

    To further simplify the simple erosion narrowing score (SENS) by removing scored areas that contribute the least to its measurement precision according to analysis based on item response theory (IRT) and to compare the measurement performance of the simplified version to the original. Baseline and 18-month data of the Combinatietherapie Bij Reumatoide Artritis (COBRA) trial were modeled using longitudinal IRT methodology. Measurement precision was evaluated across different levels of structural damage. SENS was further simplified by omitting the least reliably scored areas. Discriminant validity of SENS and its simplification were studied by comparing their ability to differentiate between the COBRA and sulfasalazine arms. Responsiveness was studied by comparing standardized change scores between versions. SENS data showed good fit to the IRT model. Carpal and feet joints contributed the least statistical information to both erosion and joint space narrowing scores. Omitting the joints of the foot reduced measurement precision for the erosion score in cases with below-average levels of structural damage (relative efficiency compared with the original version ranged 35-59%). Omitting the carpal joints had minimal effect on precision (relative efficiency range 77-88%). Responsiveness of a simplified SENS without carpal joints closely approximated the original version (i.e., all Δ standardized change scores were ≤0.06). Discriminant validity was also similar between versions for both the erosion score (relative efficiency = 97%) and the SENS total score (relative efficiency = 84%). Our results show that the carpal joints may be omitted from the SENS without notable repercussion for its measurement performance. © 2016, American College of Rheumatology.

  2. Validity of a Diagnostic Scale for Acupuncture: Application of the Item Response Theory to the Five Viscera Score

    Directory of Open Access Journals (Sweden)

    Taro Tomura

    2013-01-01

    Full Text Available In acupuncture therapy, diagnosis, acupoints, and stimulation for patients with the same illness are often inconsistent among between Traditional Chinese Medicine (TCM practitioners. This is in part due to the paucity of evidence-based diagnostic methods in TCM. To solve this problem, establishment of validated diagnostic tool is inevitable. We first applied the Item Response Theory (IRT model to the Five Viscera Score (FVS to test its validity by evaluating the ability of the questionnaire items to identify an individual’s latent traits. Next, the health-related QOL scale (SF-36, a suitable instrument for evaluating acupuncture therapy, was administered to evaluate whether the FVS can be used to make a health-related diagnosis. All 20 items of the FVS had adequate item discrimination, and 13 items had high item discrimination power. Measurement accuracy was suited for application in a range of individuals, from healthy to symptomatic. When the FVS and SF-36 were administered to other subjects, a part of which overlap with the first subjects, we found an association between the two scales, and the same findings were obtained when symptomatic and asymptomatic subjects were compared regardless of age and sex. In conclusion, the FVS may be effective in clinical diagnosis.

  3. Test of item-response bias in the CES-D scale. experience from the New Haven EPESE study.

    Science.gov (United States)

    Cole, S R; Kawachi, I; Maller, S J; Berkman, L F

    2000-03-01

    We present results of item-response bias analyses of the exogenous variables age, gender, and race for all items from the Center for Epidemiologic Studies Depression (CES-D) scale using data (N = 2340) from the New Haven component of the Established Populations for Epidemiologic Studies of the Elderly (EPESE). The proportional odds of blacks responding higher on the CES-D items "people are unfriendly" and "people dislike me" were 2.29 (95% confidence interval: 1.74, 3.02) and 2.96 (95% confidence interval: 2.15, 4.07) times that of whites matched on overall depressive symptoms, respectively. In addition, the proportional odds of women responding higher on the CES-D item "crying spells" were 2.14 (95% confidence interval: 1.60, 2.82) times that of men matched on overall depressive symptoms. Our data indicate the CES-D would have greater validity among this diverse group of older men and women after removal of the crying item and two interpersonal items.

  4. Use of Item Response Theory to Examine a Cardiovascular Health Knowledge Measure for Adolescents with Elevated Blood Pressure

    Directory of Open Access Journals (Sweden)

    Stephanie L. Fitzpatrick

    2012-10-01

    Full Text Available The purpose of this study was to assess the psychometric properties of a cardiovascular health knowledge measure for adolescents using item response theory. The measure was developed in the context of a cardiovascular lifestyle intervention for adolescents with elevated blood pressure. Sample consisted of 167 adolescents (mean age = 16.2 years who completed the Cardiovascular Health Knowledge Assessment (CHKA, a 34-item multiple choice test, at baseline and post-intervention. The CHKA was unidimensional and internal consistency was .65 at pretest and .74 at posttest. Rasch analysis results indicated that at pretest the items targeted adolescents with variable levels of health knowledge. However, based on results at posttest, additional hard items are needed to account for the increase in level of cardiovascular health knowledge at post-intervention. Change in knowledge scores was examined using Rasch analysis. Findings indicated there was significant improvement in health knowledge over time [t(119 = -10.3, p< .0001]. In summary, the CHKA appears to contain items that are good approximations of the construct cardiovascular health knowledge and items that target adolescents with moderate levels of knowledge.  DOI: 10.2458/azu_jmmss.v3i1.16111

  5. Item-response-theory analysis of two scales for self-efficacy for exercise behavior in people with arthritis.

    Science.gov (United States)

    Mielenz, Thelma J; Edwards, Michael C; Callahan, Leigh F

    2011-07-01

    Benefits of physical activity for those with arthritis are clear, yet physical activity is difficult to initiate and maintain. Self-efficacy is a key modifiable psychosocial determinant of physical activity. This study examined two scales for self-efficacy for exercise behavior (SEEB) to identify their strengths and weaknesses using item response theory (IRT) from community-based randomized controlled trials of physical activity programs in adults with arthritis. The 2 SEEB scales included the 9-item scale by Resnick developed with older adults and the 5-item scale by Marcus developed with employed adults. All IRT analyses were conducted using the graded-response model. IRT assumptions were assessed using both exploratory and confirmatory factor analysis. The IRT analyses indicated that these scales are precise and reliable measures for identifying people with arthritis and low SEEB. The Resnick SEEB scale is slightly more precise at lower levels of self-efficacy in older adults with arthritis.

  6. A note on monotone likelihood ratio of the total score variable in unidimensional item response theory.

    Science.gov (United States)

    Unlü, Ali

    2008-05-01

    This note provides a direct, elementary proof of the fundamental result on monotone likelihood ratio of the total score variable in unidimensional item response theory (IRT). This result is very important for practical measurement in IRT, because it justifies the use of the total score variable to order participants on the latent trait. The proof relies on a basic inequality for elementary symmetric functions which is proved by means of few purely algebraic, straightforward transformations. In particular, flaws in a proof of this result by Huynh [(1994). A new proof for monotone likelihood ratio for the sum of independent Bernoulli random variables. Psychometrika, 59, 77-79] are pointed out and corrected, and a natural generalization of the fundamental result to non-linear (quasi-ordered) latent trait spaces is presented. This may be useful for multidimensional IRT or knowledge space theory, in which the latent 'ability' spaces are partially ordered with respect to, for instance, coordinate-wise vector-ordering or set-inclusion, respectively.

  7. Interpreting gains and losses in conceptual test using Item Response Theory

    CERN Document Server

    Lamine, Brahim

    2015-01-01

    Conceptual tests are widely used by physics instructors to assess students' conceptual understanding and compare teaching methods. It is common to look at students' changes in their answers between a pre-test and a post-test to quantify a transition in student's conceptions. This is often done by looking at the proportion of incorrect answers in the pre-test that changes to correct answers in the post-test -- the gain -- and the proportion of correct answers that changes to incorrect answers -- the loss. By comparing theoretical predictions to experimental data on the Force Concept Inventory, we shown that Item Response Theory (IRT) is able to fairly well predict the observed gains and losses. We then use IRT to quantify the student's changes in a test-retest situation when no learning occurs and show that $i)$ up to 25\\% of total answers can change due to the non-deterministic nature of student's answer and that $ii)$ gains and losses can go from 0\\% to 100\\%. Still using IRT, we highlight the conditions tha...

  8. Development of new physical activity and sedentary behavior change self-efficacy questionnaires using item response modeling

    Science.gov (United States)

    Theoretically, increased levels of physical activity self-efficacy (PASE) should lead to increased physical activity, but few studies have reported this effect among youth. This failure may be at least partially attributable to measurement limitations. In this study, Item Response Modeling (IRM) was...

  9. Negative affectivity and social inhibition in cardiovascular disease: evaluating type-D personality and its assessment using item response theory

    NARCIS (Netherlands)

    Emons, Wilco H.M.; Meijer, Rob R.; Denollet, Johan

    2007-01-01

    Objective: Individuals with increased levels of both negative affectivity (NA) and social inhibition (SI)—referred to as type-D personality—are at increased risk of adverse cardiac events. We used item response theory (IRT) to evaluate NA, SI, and type-D personality as measured by the DS14. The obje

  10. The Divergent Meanings of Life Satisfaction: Item Response Modeling of the Satisfaction with Life Scale in Greenland and Norway

    Science.gov (United States)

    Vitterso, Joar; Biswas-Diener, Robert; Diener, Ed

    2005-01-01

    Cultural differences in response to the Satisfaction With Life Scale (SWLS) items is investigated. Data were fit to a mixed Rasch model in order to identify latent classes of participants in a combined sample of Norwegians (N = 461) and Greenlanders (N = 180). Initial analyses showed no mean difference in life satisfaction between the two…

  11. An Analysis of Cross Racial Identity Scale Scores Using Classical Test Theory and Rasch Item Response Models

    Science.gov (United States)

    Sussman, Joshua; Beaujean, A. Alexander; Worrell, Frank C.; Watson, Stevie

    2013-01-01

    Item response models (IRMs) were used to analyze Cross Racial Identity Scale (CRIS) scores. Rasch analysis scores were compared with classical test theory (CTT) scores. The partial credit model demonstrated a high goodness of fit and correlations between Rasch and CTT scores ranged from 0.91 to 0.99. CRIS scores are supported by both methods.…

  12. Increasing the Number of Replications in Item Response Theory Simulations: Automation through SAS and Disk Operating System

    Science.gov (United States)

    Gagne, Phill; Furlow, Carolyn; Ross, Terris

    2009-01-01

    In item response theory (IRT) simulation research, it is often necessary to use one software package for data generation and a second software package to conduct the IRT analysis. Because this can substantially slow down the simulation process, it is sometimes offered as a justification for using very few replications. This article provides…

  13. A multidimensional item response model : Constrained latent class analysis using the Gibbs sampler and posterior predictive checks

    NARCIS (Netherlands)

    Hoijtink, H; Molenaar, IW

    1997-01-01

    In this paper it will be shown that a certain class of constrained latent class models may be interpreted as a special case of nonparametric multidimensional item response models. The parameters of this latent class model will be estimated using an application of the Gibbs sampler. It will be illust

  14. Computer adaptive practice of Maths ability using a new item response model for on the fly ability and difficulty estimation

    NARCIS (Netherlands)

    Klinkenberg, S.; Straatemeier, M.; van der Maas, H.L.J.

    2011-01-01

    In this paper we present a model for computerized adaptive practice and monitoring. This model is used in the Maths Garden, a web-based monitoring system, which includes a challenging web environment for children to practice arithmetic. Using a new item response model based on the Elo (1978) rating

  15. Development of an Abbreviated Social Phobia and Anxiety Inventory (SPAI) Using Item Response Theory: The SPAI-23

    Science.gov (United States)

    Roberson-Nay, Roxann; Strong, David R.; Nay, William T.; Beidel, Deborah C.; Turner, Samuel M.

    2007-01-01

    An abbreviated version of the Social Phobia and Anxiety Inventory (SPAI) was developed using methods based in nonparametric item response theory. Participants included a nonclinical sample of 1,482 undergraduates (52% female, mean age = 19.4 years) as well as a clinical sample of 105 individuals (56% female, mean age = 36.4 years) diagnosed with…

  16. An Exploratory Study of the Applicability of Item Response Theory Methods to the Graduate Management Admission Test.

    Science.gov (United States)

    Kingston, Neal; And Others

    A necessary prerequisite to the operational use of item response theory (IRT) in any testing program is the investigation of the feasibility of such an approach. This report presents the results of such research for the Graduate Management Admission Test (GMAT). Despite the fact that GMAT data appear to violate a basic assumption of the…

  17. Increasing the Number of Replications in Item Response Theory Simulations: Automation through SAS and Disk Operating System

    Science.gov (United States)

    Gagne, Phill; Furlow, Carolyn; Ross, Terris

    2009-01-01

    In item response theory (IRT) simulation research, it is often necessary to use one software package for data generation and a second software package to conduct the IRT analysis. Because this can substantially slow down the simulation process, it is sometimes offered as a justification for using very few replications. This article provides…

  18. The Final 24-Item Early Onset Scoliosis Questionnaires (EOSQ-24): Validity, Reliability and Responsiveness.

    Science.gov (United States)

    Matsumoto, Hiroko; Williams, Brendan; Park, Howard Y; Yoshimachi, Julie Y; Roye, Benjamin D; Roye, David P; Akbarnia, Behrooz A; Emans, John; Skaggs, David; Smith, John T; Vitale, Michael G

    2016-06-13

    The goal of early-onset scoliosis (EOS) treatment is to improve health-related quality of life (HRQoL) for patients and to reduce the burden on their parents or caregivers. The purpose of this study is to develop and finalize the 24-item Early-Onset Scoliosis Questionnaire (EOSQ-24), and examine the validity, reliability, and responsiveness of the EOSQ-24 in measuring patients' HRQoL, the burden on their caregivers, and the burden on their caregiver's finances. The study also established aged-matched normative values for the EOSQ-24. The EOSQ-24 was administered to caregivers of male and female patients aged 0 to 18 years with EOS. Patients with EOS are diagnosed before 10 years of age. Criterion validity was investigated by measuring agreement between its scores and pulmonary function testing. Construct validity was established by comparing values across different etiology groups using the known-group method, and measuring internal consistency reliability. Content validity was confirmed by reviewing caregiver and health provider ratings for the relativity and clarity of the EOSQ-24 questions. Test-retest reliability was examined through intraclass correlation coefficients. Responsiveness of the EOSQ-24 before and after surgical interventions was also investigated. Age-matched, healthy patients, without spinal deformity, were enrolled to establish normative EOSQ-24 values. The pulmonary function subdomain score in the EOSQ-24 was positively correlated with pulmonary function testing values, establishing criterion validity. The EOSQ-24 scores for neuromuscular patients were significantly decreased compared with idiopathic or congenital/structural patients, demonstrating known-group validity. Internal consistency reliability of patients' HRQoL was excellent (0.92), but Family Burden was questionable (0.64) indicating that Parental Burden and Financial Burden should be in separate domains. All 24 EOSQ items were rated as essential and clear, confirming content

  19. The Effect of Using Different Weights for Multiple-Choice and Free-Response Item Sections

    Science.gov (United States)

    Hendrickson, Amy; Patterson, Brian; Melican, Gerald

    2008-01-01

    Presented at the Annual National Council on Measurement in Education (NCME) in New York in March 2008. This presentation explores how different item weighting can affect the effective weights, validity coefficents and test reliability of composite scores among test takers.

  20. The nature of phonological awareness throughout the elementary grades: An item response theory perspective

    NARCIS (Netherlands)

    Vloedgraven, J.M.T.; Verhoeven, L.T.W.

    2009-01-01

    In the present study, the nature of Dutch children's phonological awareness was examined throughout the elementary school grades. Phonological awareness was assessed using five different sets of items that measured rhyming, phoneme identification, phoneme blending, phoneme segmentation, and phoneme

  1. The nature of phonological awareness throughout the elementary grades: An item response theory perspective

    NARCIS (Netherlands)

    Vloedgraven, J.M.T.; Verhoeven, L.T.W.

    2009-01-01

    In the present study, the nature of Dutch children's phonological awareness was examined throughout the elementary school grades. Phonological awareness was assessed using five different sets of items that measured rhyming, phoneme identification, phoneme blending, phoneme segmentation, and phoneme

  2. Modeling Unproductive Behavior in Online Homework in Terms of Latent Student Traits: An Approach Based on Item Response Theory

    Science.gov (United States)

    Gönülateş, Emre; Kortemeyer, Gerd

    2017-04-01

    Homework is an important component of most physics courses. One of the functions it serves is to provide meaningful formative assessment in preparation for examinations. However, correlations between homework and examination scores tend to be low, likely due to unproductive student behavior such as copying and random guessing of answers. In this study, we attempt to model these two counterproductive learner behaviors within the framework of Item Response Theory in order to provide an ability measurement that strongly correlates with examination scores. We find that introducing additional item parameters leads to worse predictions of examination grades, while introducing additional learner traits is a more promising approach.

  3. Modeling Unproductive Behavior in Online Homework in Terms of Latent Student Traits: An Approach Based on Item Response Theory

    Science.gov (United States)

    Gönülateş, Emre; Kortemeyer, Gerd

    2016-10-01

    Homework is an important component of most physics courses. One of the functions it serves is to provide meaningful formative assessment in preparation for examinations. However, correlations between homework and examination scores tend to be low, likely due to unproductive student behavior such as copying and random guessing of answers. In this study, we attempt to model these two counterproductive learner behaviors within the framework of Item Response Theory in order to provide an ability measurement that strongly correlates with examination scores. We find that introducing additional item parameters leads to worse predictions of examination grades, while introducing additional learner traits is a more promising approach.

  4. Simulation-based Bayesian inference for latent traits of item response models: Introduction to the ltbayes package for R.

    Science.gov (United States)

    Johnson, Timothy R; Kuhn, Kristine M

    2015-12-01

    This paper introduces the ltbayes package for R. This package includes a suite of functions for investigating the posterior distribution of latent traits of item response models. These include functions for simulating realizations from the posterior distribution, profiling the posterior density or likelihood function, calculation of posterior modes or means, Fisher information functions and observed information, and profile likelihood confidence intervals. Inferences can be based on individual response patterns or sets of response patterns such as sum scores. Functions are included for several common binary and polytomous item response models, but the package can also be used with user-specified models. This paper introduces some background and motivation for the package, and includes several detailed examples of its use.

  5. Developing item banks for measuring pediatric generic health-related quality of life: an application of the International Classification of Functioning, Disability and Health for Children and Youth and item response theory.

    Science.gov (United States)

    Gandhi, Pranav K; Thompson, Lindsay A; Tuli, Sanjeev Y; Revicki, Dennis A; Shenkman, Elizabeth; Huang, I-Chan

    2014-01-01

    The purpose of this study was to develop item banks by linking items from three pediatric health-related quality of life (HRQoL) instruments using a mixed methodology. Secondary data were collected from 469 parents of children aged 8-16 years. The International Classification of Functioning, Disability and Health-Children and Youth (ICF-CY) served as a framework to compare the concepts of items from three HRQoL instruments. The structural validity of the individual domains was examined using confirmatory factor analyses. Samejima's Graded Response Model was used to calibrate items from different instruments. The known-groups validity of each domain was examined using the status of children with special health care needs (CSHCN). Concepts represented by the items in the three instruments were linked to 24 different second-level categories of the ICF-CY. Eight item banks representing eight unidimensional domains were created based on the linkage of the concepts measured by the items of the three instruments to the ICF-CY. The HRQoL results of CSHCN in seven out of eight domains (except personality) were significantly lower compared with children without special health care needs (p<0.05). This study demonstrates a useful approach to compare the item concepts from the three instruments and to generate item banks for a pediatric population.

  6. Shortening a survey and using alternative forms of prenotification: Impact on response rate and quality

    Directory of Open Access Journals (Sweden)

    Jenkins Sarah

    2010-06-01

    Full Text Available Abstract Background Evidence suggests that survey response rates are decreasing and that the level of survey response can be influenced by questionnaire length and the use of pre-notification. The goal of the present investigation was determine the effect of questionnaire length and pre-notification type (letter vs. postcard on measures of survey quality, including response rates, response times (days to return the survey, and item nonresponse. Methods In July 2008, the authors randomized 900 residents of Olmsted County, Minnesota aged 25-65 years to one of two versions of the Talley Bowel Disease Questionnaire, a survey designed to assess the prevalence of functional gastrointestinal disorders (FGID. One version was two pages long and the other 4 pages. Using a 2 × 2 factorial design, respondents were randomized to survey length and one of two pre-notification types, letter or postcard; 780 residents ultimately received a survey, after excluding those who had moved outside the county or passed away. Results Overall, the response rates (RR did not vary by length of survey (RR = 44.6% for the 2-page survey and 48.4% for the 4-page or pre-notification type (RR = 46.3% for the letter and 46.8% for the postcard. Differences in response rates by questionnaire length were seen among younger adults who were more likely to respond to the 4-page than the 2-page questionnaire (RR = 39.0% compared to 21.8% for individuals in their 20s and RR = 49.0% compared to 32.3% for those in their 30s. There were no differences across conditions with respect to item non-response or time (days after mailing to survey response. Conclusion This study suggests that the shortest survey does not necessarily provide the best option for increased response rates and survey quality. Pre-notification type (letter or postcard did not impact response rate suggesting that postcards may be more beneficial due to the lower associated costs of this method of contact.

  7. Evaluating and Refining the Construct of Sexual Quality With Item Response Theory: Development of the Quality of Sex Inventory.

    Science.gov (United States)

    Shaw, Amanda M; Rogge, Ronald D

    2016-02-01

    This study took a critical look at the construct of sexual quality. The 65 items of four well-validated self-report measures of sexual satisfaction (the Index of Sexual Satisfaction [ISS], Hudson, Harrison, & Crosscup, 1981; the Global Measure of Sexual Satisfaction [GMSEX], Lawrance & Byers, 1995; the Pinney Sexual Satisfaction Inventory [PSSI], Pinney, Gerrard, & Denney, 1987; the Young Sexual Satisfaction Scale [YSSS], Young, Denny, Luquis, & Young, 1998) and an additional 74 potential sexual quality items were given to 3060 online participants. Using Item Response Theory (IRT), we demonstrated that the ISS, YSSS, and PSSI scales provided suboptimal levels of precision in assessing sexual quality, particularly given the length of those scales. Exploratory factor analyses, IRT, differential item functioning analyses, and longitudinal responsiveness analyses were used to develop and evaluate the Quality of Sex Inventory. Results suggested that, in comparison to existing scales, the QSI (1) offers investigators and clinicians more theoretically focused scales, (2) distinguishes sexual satisfaction from sexual dissatisfaction, and (3) offers greater precision and power for detecting differences with (4) comparably high levels of responsiveness for detecting change over time despite being notably shorter than most of the existing scales. The QSI-satisfaction subscales demonstrated strong convergent validity with other measures of sexual satisfaction and excellent construct validity with anchor scales from the nomological net surrounding that construct, suggesting that they continue to assess the same theoretical construct as prior scales. Implications for research are discussed.

  8. Quality of life assessed with the medical outcomes study short form 36-item health survey of patients on renal replacement therapy: A systematic review and meta-analysis

    NARCIS (Netherlands)

    Y.S. Liem (Ylian Serina); J.L. Bosch (Johanna); L.R. Arends (Lidia); M.H. Heijenbrok-Kal (Majanka); M.G.M. Hunink (Myriam)

    2007-01-01

    textabstractObjectives: The Medical Outcomes Study Short Form 36-Item Health Survey (SF-36) is the most widely used generic instrument to estimate quality of life of patients on renal replacement therapy. Purpose of this study was to summarize and compare the published literature on quality of life

  9. Quality of life assessed with the medical outcomes study short form 36-item health survey of patients on renal replacement therapy: A systematic review and meta-analysis

    NARCIS (Netherlands)

    Y.S. Liem (Ylian Serina); J.L. Bosch (Johanna); L.R. Arends (Lidia); M.H. Heijenbrok-Kal (Majanka); M.G.M. Hunink (Myriam)

    2007-01-01

    textabstractObjectives: The Medical Outcomes Study Short Form 36-Item Health Survey (SF-36) is the most widely used generic instrument to estimate quality of life of patients on renal replacement therapy. Purpose of this study was to summarize and compare the published literature on quality of

  10. Modifying parents peer attachment scale with item response theory%用项目反应理论修订父母同伴依恋量表

    Institute of Scientific and Technical Information of China (English)

    臧运洪; 赵守盈; 陈维; 潘运; 张禹

    2012-01-01

    The item discrimination, difficulty and information peak function of the item response theory are used to revise parents peer attachment scale produced by Armsden and Greenberg ( 1991 ), the purpose is that this scale revised is more accurate to survey the status of parents peer attachment of Chinese youth. SPSS15.0 software is used to manage data , using MULTILOG 7.03 software to analysis parameters, using AMOS4.0 to test the verification revised. Results are as follows : 1. Parents peer attachment scale is one-dimensional which can be revised by item response theory. 2. The item discrimination a, difficulty b of new scale are with reasonable scope. 3. The test information peak function of new scale is smaller and has a higher reliability. New father and peer attachment scale contain two factors: trust and communication. New mather attachment scale include factors: trust, communication and alienation,which have the same factors with the original scale . Surveyed officially, the scale revised can effectively survey the status of parents peer attachment of Chinese miao youth.%应用项目反应理论的区分度、难度和信息函数峰值3个参数对Armsden和Greenberg(1991)的父母同伴依恋量表进行修订,目的:使修订后的量表更能精确地调查中国初中生的依恋现状。结果:父母同伴依恋量表符合单维性检验,可以根据项目反应理论进行修订。新量表的区分度a值和难度b值具有合理的取值范围。新量表的测验信息峰值函数变小,具有更高的信度。新父亲和同伴依恋量表均包含两个因子:信任和沟通。新母亲依恋量表包含的因子个数和原量表相同:信任、沟通和疏离。经正式施测,修订后的量表可以有效地调查中国苗族初中生的依恋现状。

  11. Handling Protest Responses in Contingent Valuation Surveys.

    Science.gov (United States)

    Pennington, Mark; Gomes, Manuel; Donaldson, Cam

    2017-08-01

    Protest responses, whereby respondents refuse to state the value they place on the health gain, are commonly encountered in contingent valuation (CV) studies, and they tend to be excluded from analyses. Such an approach will be biased if protesters differ from non-protesters on characteristics that predict their responses. The Heckman selection model has been commonly used to adjust for protesters, but its underlying assumptions may be implausible in this context. We present a multiple imputation (MI) approach to appropriately address protest responses in CV studies, and compare it with the Heckman selection model. This study exploits data from the multinational EuroVaQ study, which surveyed respondents' willingness-to-pay (WTP) for a Quality Adjusted Life Year (QALY). Here, our simulation study assesses the relative performance of MI and Heckman selection models across different realistic settings grounded in the EuroVaQ study, including scenarios with different proportions of missing data and non-response mechanisms. We then illustrate the methods in the EuroVaQ study for estimating mean WTP for a QALY gain. We find that MI provides lower bias and mean squared error compared with the Heckman approach across all considered scenarios. The simulations suggest that the Heckman approach can lead to considerable underestimation or overestimation of mean WTP due to violations in the normality assumption, even after log-transforming the WTP responses. The case study illustrates that protesters are associated with a lower mean WTP for a QALY gain compared with non-protesters, but that the results differ according to method for handling protesters. MI is an appropriate method for addressing protest responses in CV studies.

  12. Testing the ruler with item response theory: increasing precision of measurement for relationship satisfaction with the Couples Satisfaction Index.

    Science.gov (United States)

    Funk, Janette L; Rogge, Ronald D

    2007-12-01

    The present study took a critical look at a central construct in couples research: relationship satisfaction. Eight well-validated self-report measures of relationship satisfaction, including the Marital Adjustment Test (MAT; H. J. Locke & K. M. Wallace, 1959), the Dyadic Adjustment Scale (DAS; G. B. Spanier, 1976), and an additional 75 potential satisfaction items, were given to 5,315 online participants. Using item response theory, the authors demonstrated that the MAT and DAS provided relatively poor levels of precision in assessing satisfaction, particularly given the length of those scales. Principal-components analysis and item response theory applied to the larger item pool were used to develop the Couples Satisfaction Index (CSI) scales. Compared with the MAS and the DAS, the CSI scales were shown to have higher precision of measurement (less noise) and correspondingly greater power for detecting differences in levels of satisfaction. The CSI scales demonstrated strong convergent validity with other measures of satisfaction and excellent construct validity with anchor scales from the nomological net surrounding satisfaction, suggesting that they assess the same theoretical construct as do prior scales. Implications for research are discussed.

  13. Psychometric properties of the neck disability index amongst patients with chronic neck pain using item response theory.

    Science.gov (United States)

    Saltychev, Mikhail; Mattie, Ryan; McCormick, Zachary; Laimi, Katri

    2017-05-13

    The Neck Disability Index (NDI) is commonly used for clinical and research assessment for chronic neck pain, yet the original version of this tool has not undergone significant validity testing, and in particular, there has been minimal assessment using Item Response Theory. The goal of the present study was to investigate the psychometric properties of the original version of the NDI in a large sample of individuals with chronic neck pain by defining its internal consistency, construct structure and validity, and its ability to discriminate between different degrees of functional limitation. This is a cross-sectional cohort study of 585 consecutive patients with chronic neck pain seen in a university hospital rehabilitation clinic. Internal consistency was evaluated using Cronbach's alpha, construct structure was evaluated by exploratory factor analysis, and discrimination ability was determined by Item Response Theory. The NDI demonstrated good internal consistency assessed by Cronbach's alpha (0.87). The exploratory factor analysis identified only one factor with eigenvalue considered significant (cutoff 1.0). When analyzed by Item Response Theory, eight out of 10 items demonstrated almost ideal difficulty parameter estimates. In addition, eight out of 10 items showed high to perfect estimates of discrimination ability (overall range 0.8 to 2.9). Amongst patients with chronic neck pain, the NDI was found to have good internal consistency, have unidimensional properties, and an excellent ability to distinguish patients with different levels of perceived disability. Implications for Rehabilitation The Neck Disability Index has good internal consistency, unidimensional properties, and an excellent ability to distinguish patients with different levels of perceived disability. The Neck Disability Index is recommended for use when selecting patients for rehabilitation, setting rehabilitation goals, and measuring the outcome of intervention.

  14. Forced-Choice Assessment of Work-Related Maladaptive Personality Traits: Preliminary Evidence From an Application of Thurstonian Item Response Modeling.

    Science.gov (United States)

    Guenole, Nigel; Brown, Anna A; Cooper, Andrew J

    2016-04-07

    This article describes an investigation of whether Thurstonian item response modeling is a viable method for assessment of maladaptive traits. Forced-choice responses from 420 working adults to a broad-range personality inventory assessing six maladaptive traits were considered. The Thurstonian item response model's fit to the forced-choice data was adequate, while the fit of a counterpart item response model to responses to the same items but arranged in a single-stimulus design was poor. Monotrait heteromethod correlations indicated corresponding traits in the two formats overlapped substantially, although they did not measure equivalent constructs. A better goodness of fit and higher factor loadings for the Thurstonian item response model, coupled with a clearer conceptual alignment to the theoretical trait definitions, suggested that the single-stimulus item responses were influenced by biases that the independent clusters measurement model did not account for. Researchers may wish to consider forced-choice designs and appropriate item response modeling techniques such as Thurstonian item response modeling for personality questionnaire applications in industrial psychology, especially when assessing maladaptive traits. We recommend further investigation of this approach in actual selection situations and with different assessment instruments. © The Author(s) 2016.

  15. A Substantive Process Analysis of Responses to Items from the Multistate Bar Examination

    Science.gov (United States)

    Bonner, Sarah M.; D'Agostino, Jerome V.

    2012-01-01

    We investigated examinees' cognitive processes while they solved selected items from the Multistate Bar Exam (MBE), a high-stakes professional certification examination. We focused on ascertaining those mental processes most frequently used by examinees, and the most common types of errors in their thinking. We compared the relationships between…

  16. Cognitive Diagnostic Models for Tests with Multiple-Choice and Constructed-Response Items

    Science.gov (United States)

    Kuo, Bor-Chen; Chen, Chun-Hua; Yang, Chih-Wei; Mok, Magdalena Mo Ching

    2016-01-01

    Traditionally, teachers evaluate students' abilities via their total test scores. Recently, cognitive diagnostic models (CDMs) have begun to provide information about the presence or absence of students' skills or misconceptions. Nevertheless, CDMs are typically applied to tests with multiple-choice (MC) items, which provide less diagnostic…

  17. Automated Scoring of Constructed-Response Science Items: Prospects and Obstacles

    Science.gov (United States)

    Liu, Ou Lydia; Brew, Chris; Blackmore, John; Gerard, Libby; Madhok, Jacquie; Linn, Marcia C.

    2014-01-01

    Content-based automated scoring has been applied in a variety of science domains. However, many prior applications involved simplified scoring rubrics without considering rubrics representing multiple levels of understanding. This study tested a concept-based scoring tool for content-based scoring, c-rater™, for four science items with rubrics…

  18. The Nature of Phonological Awareness throughout the Elementary Grades: An Item Response Theory Perspective

    Science.gov (United States)

    Vloedgraven, Judith; Verhoeven, Ludo

    2009-01-01

    In the present study, the nature of Dutch children's phonological awareness was examined throughout the elementary school grades. Phonological awareness was assessed using five different sets of items that measured rhyming, phoneme identification, phoneme blending, phoneme segmentation, and phoneme deletion. A sample of 1405 children from…

  19. Partially Compensatory Multidimensional Item Response Theory Models: Two Alternate Model Forms

    Science.gov (United States)

    DeMars, Christine E.

    2016-01-01

    Partially compensatory models may capture the cognitive skills needed to answer test items more realistically than compensatory models, but estimating the model parameters may be a challenge. Data were simulated to follow two different partially compensatory models, a model with an interaction term and a product model. The model parameters were…

  20. A Teoria da Resposta ao Item: possíveis contribuições aos estudos em marketing The Item Response Theory: possible contributions to marketing studies

    Directory of Open Access Journals (Sweden)

    Danielle Ramos de Miranda Pereira

    2011-01-01

    Full Text Available A constatação da ampla utilização de escalas multidimensionais por parte dos pesquisadores da área de marketing motivou a elaboração de um artigo com o propósito de discutir a aplicação da Teoria da Resposta ao Item (TRI, bem como apresentar a essa área um método que tem se mostrado bastante eficaz na estimação de construtos comportamentais. Sendo assim, o artigo apresenta uma discussão sobre a TRI, ressaltando seus avanços em relação à Teoria Clássica do Teste (TCT e suas aplicações tradicionais no campo da psicometria e da avaliação educacional. Para verificar sua aplicabilidade nos estudos de marketing, julgou-se adequado conduzir uma aplicação prática da TRI em um estudo envolvendo uma escala já bastante utilizada pelos pesquisadores - a de orientação de mercado (Escala MkTor proposta por Narver e Slater (1990. Os resultados da aplicação demonstraram que, embora o modelo da TRI proposto possa ser considerado satisfatório para a aplicação no contexto da Orientação para o Mercado, existem muitos desafios a serem enfrentados por novos estudos como a construção de uma escala com interpretação prática, indicando o que significa para uma empresa possuir um nível de maturidade associado a um determinado construto. As considerações finais ressaltam que a grande contribuição do artigo aos estudos em marketing é a apresentação de um método alternativo para estimar de forma mais apurada os construtos e avaliar a qualidade dos itens das escalas.The widespread utilization of multidimensional scales by researchers in field of marketing have motivated the conduction of a study to discuss the application of the Item Response Theory (IRT as well as presenting a method that has proved very effective in the estimation of behavioral constructs. Therefore, this article presents a discussion about IRT highlighting its advances regarding the Classical Theory of Tests (CTT and its traditional applications in the

  1. Modeling Nonignorable Missing Data with Item Response Theory (IRT). Research Report. ETS RR-10-11

    Science.gov (United States)

    Rose, Norman; von Davier, Matthias; Xu, Xueli

    2010-01-01

    Large-scale educational surveys are low-stakes assessments of educational outcomes conducted using nationally representative samples. In these surveys, students do not receive individual scores, and the outcome of the assessment is inconsequential for respondents. The low-stakes nature of these surveys, as well as variations in average performance…

  2. An application of item response theory analysis to alcohol, cannabis, and cocaine criteria in DSM-IV.

    Science.gov (United States)

    Langenbucher, James W; Labouvie, Erich; Martin, Christopher S; Sanjuan, Pilar M; Bavly, Lawrence; Kirisci, Levent; Chung, Tammy

    2004-02-01

    Item response theory (IRT) is supplanting classical test theory as the basis for measures development. This study demonstrated the utility of IRT for evaluating DSM-IV diagnostic criteria. Data on alcohol, cannabis, and cocaine symptoms from 372 adult clinical participants interviewed with the Composite International Diagnostic Interview--Expanded Substance Abuse Module (CIDI-SAM) were analyzed with Mplus (B. Muthen & L. Muthen, 1998) and MULTILOG (D. Thissen, 1991) software. Tolerance and legal problems criteria were dropped because of poor fit with a unidimensional model. Item response curves, test information curves, and testing of variously constrained models suggested that DSM-IV criteria in the CIDI-SAM discriminate between only impaired and less impaired cases and may not be useful to scale case severity. IRT can be used to study the construct validity of DSM-IV diagnoses and to identify diagnostic criteria with poor performance.

  3. World Health Organization Quality-of-Life Scale (WHOQOL-BREF: Analyses Of Their Item Response Theory Properties Based On The Graded Responses Model

    Directory of Open Access Journals (Sweden)

    Shahrum Vahedi

    2010-11-01

    Full Text Available "nObjective: This study has used Item Response Theory (IRT to examine the psychometric properties of Health-Related Quality-of-Life. "nMethod: This investigation is a descriptive- analytic study. Subjects were 370 undergraduate students of nursing and midwifery who were selected from Tabriz University of Medical Sciences. All participants were asked to complete the Farsi version of WHOQOL-BREF. Samejima's graded response model was used for the analyses. "nResults: The results revealed that the discrimination parameters for all items in the four scales were low to moderate. The threshold parameters showed adequate representation of the relevant traits from low to the mean trait level. With the exception of 15, 18, 24 and 26 items, all other items showed low item information function values, and thus relatively high reliability from low trait levels to moderate levels. "nConclusions: The results of this study indicate that although there was general support for the psychometric properties of the WHOQOL-BREF from an IRT perspective, this measure can be further improved. IRT analyses provided useful measurement information and demonstrated to be a better methodological approach for enhancing our knowledge of the functionality of WHOQOL-BREF.

  4. World Health Organization Quality-of-Life Scale (WHOQOL-BREF): Analyses of Their Item Response Theory Properties Based on the Graded Responses Model

    Science.gov (United States)

    2010-01-01

    Objective This study has used Item Response Theory (IRT) to examine the psychometric properties of Health-Related Quality-of-Life. Method This investigation is a descriptive- analytic study. Subjects were 370 undergraduate students of nursing and midwifery who were selected from Tabriz University of Medical Sciences. All participants were asked to complete the Farsi version of WHOQOL-BREF. Samejima's graded response model was used for the analyses. Results The results revealed that the discrimination parameters for all items in the four scales were low to moderate. The threshold parameters showed adequate representation of the relevant traits from low to the mean trait level. With the exception of 15, 18, 24 and 26 items, all other items showed low item information function values, and thus relatively high reliability from low trait levels to moderate levels. Conclusions The results of this study indicate that although there was general support for the psychometric properties of the WHOQOL-BREF from an IRT perspective, this measure can be further improved. IRT analyses provided useful measurement information and demonstrated to be a better methodological approach for enhancing our knowledge of the functionality of WHOQOL-BREF. PMID:22952508

  5. Assessing Understanding of the Concept of Function: A Study Comparing Prospective Secondary Mathematics Teachers' Responses to Multiple-Choice and Constructed-Response Items

    Science.gov (United States)

    Feeley, Susan Jane

    2013-01-01

    The purpose of this study was to determine whether multiple-choice and constructed-response items assessed prospective secondary mathematics teachers' understanding of the concept of function. The conceptual framework for the study was the Dreyfus and Eisenberg (1982) Function Block. The theoretical framework was Sierpinska's (1992, 1994)…

  6. Survey Response Rates and Survey Administration in Counseling and Clinical Psychology: A Meta-Analysis

    Science.gov (United States)

    Van Horn, Pamela S.; Green, Kathy E.; Martinussen, Monica

    2009-01-01

    This article reports results of a meta-analysis of survey response rates in published research in counseling and clinical psychology over a 20-year span and describes reported survey administration procedures in those fields. Results of 308 survey administrations showed a weighted average response rate of 49.6%. Among possible moderators, response…

  7. Survey Response Rates and Survey Administration in Counseling and Clinical Psychology: A Meta-Analysis

    Science.gov (United States)

    Van Horn, Pamela S.; Green, Kathy E.; Martinussen, Monica

    2009-01-01

    This article reports results of a meta-analysis of survey response rates in published research in counseling and clinical psychology over a 20-year span and describes reported survey administration procedures in those fields. Results of 308 survey administrations showed a weighted average response rate of 49.6%. Among possible moderators, response…

  8. Is it nutrients, food items, diet quality or eating behaviours that are responsible for the association of children's diet with sleep?

    Science.gov (United States)

    Khan, Mohammad K A; Faught, Erin L; Chu, Yen Li; Ekwaru, John P; Storey, Kate E; Veugelers, Paul J

    2017-08-01

    Both diet quality and sleep duration of children have declined in the past decades. Several studies have suggested that diet and sleep are associated; however, it is not established which aspects of the diet are responsible for this association. Is it nutrients, food items, diet quality or eating behaviours? We surveyed 2261 grade 5 children on their dietary intake and eating behaviours, and their parents on their sleep duration and sleep quality. We performed factor analysis to identify and quantify the essential factors among 57 nutrients, 132 food items and 19 eating behaviours. We considered these essential factors along with a diet quality score in multivariate regression analyses to assess their independent associations with sleep. Nutrients, food items and diet quality did not exhibit independent associations with sleep, whereas two groupings of eating behaviours did. 'Unhealthy eating habits and environments' was independently associated with sleep. For each standard deviation increase in their factor score, children had 6 min less sleep and were 12% less likely to have sleep of good quality. 'Snacking between meals and after supper' was independently associated with sleep quality. For each standard deviation increase in its factor score, children were 7% less likely to have good quality sleep. This study demonstrates that eating behaviours are responsible for the associations of diet with sleep among children. Health promotion programmes aiming to improve sleep should therefore focus on discouraging eating behaviours such as eating alone or in front of the TV, and snacking between meals and after supper. © 2016 European Sleep Research Society.

  9. Development of new physical activity and sedentary behavior change self-efficacy questionnaires using item response modeling

    Directory of Open Access Journals (Sweden)

    Venditti Elizabeth

    2009-03-01

    Full Text Available Abstract Background Theoretically, increased levels of physical activity self-efficacy (PASE should lead to increased physical activity, but few studies have reported this effect among youth. This failure may be at least partially attributable to measurement limitations. In this study, Item Response Modeling (IRM was used to develop new physical activity and sedentary behavior change self-efficacy scales. The validity of the new scales was compared with accelerometer assessments of physical activity and sedentary behavior. Methods New PASE and sedentary behavior change (TV viewing, computer video game use, and telephone use self-efficacy items were developed. The scales were completed by 714, 6th grade students in seven US cities. A limited number of participants (83 also wore an accelerometer for five days and provided at least 3 full days of complete data. The new scales were analyzed using Classical Test Theory (CTT and IRM; a reduced set of items was produced with IRM and correlated with accelerometer counts per minute and minutes of sedentary, light and moderate to vigorous activity per day after school. Results The PASE items discriminated between high and low levels of PASE. Full and reduced scales were weakly correlated (r = 0.18 with accelerometer counts per minute after school for boys, with comparable associations for girls. Weaker correlations were observed between PASE and minutes of moderate to vigorous activity (r = 0.09 – 0.11. The uni-dimensionality of the sedentary scales was established by both exploratory factor analysis and the fit of items to the underlying variable and reliability was assessed across the length of the underlying variable with some limitations. The reduced sedentary behavior scales had poor reliability. The full scales were moderately correlated with light intensity physical activity after school (r = 0.17 to 0.33 and sedentary behavior (r = -0.29 to -0.12 among the boys, but not for girls. Conclusion New

  10. The psychometric properties of the "Reading the Mind in the Eyes" Test: an item response theory (IRT) analysis.

    Science.gov (United States)

    Preti, Antonio; Vellante, Marcello; Petretto, Donatella R

    2017-05-01

    The "Reading the Mind in the Eyes" Test (hereafter: Eyes Test) is considered an advanced task of the Theory of Mind aimed at assessing the performance of the participant in perspective-takingthat is, the ability to sense or understand other people's cognitive and emotional states. In this study, the item response theory analysis was applied to the adult version of the Eyes Test. The Italian version of the Eyes Test was administered to 200 undergraduate students of both genders (males = 46%). Modified parallel analysis (MPA) was used to test unidimensionality. Marginal maximum likelihood estimation was used to fit the 1-, 2-, and 3-parameter logistic (PL) model to the data. Differential Item Functioning (DIF) due to gender was explored with five independent methods. MPA provided evidence in favour of unidimensionality. The Rasch model (1-PL) was superior to the other two models in explaining participants' responses to the Eyes Test. There was no robust evidence of gender-related DIF in the Eyes Test, although some differences may exist for some items as a reflection of real differences by group. The study results support a one-factor model of the Eyes Test. Performance on the Eyes Test is defined by the participant's ability in perspective-taking. Researchers should cease using arbitrarily selected subscores in assessing the performance of participants to the Eyes Test. Lack of gender-related DIF favours the use of the Eyes Test in the investigation of gender differences concerning empathy and social cognition.

  11. Detecting Local Item Dependence in Polytomous Adaptive Data

    Science.gov (United States)

    Mislevy, Jessica L.; Rupp, Andre A.; Harring, Jeffrey R.

    2012-01-01

    A rapidly expanding arena for item response theory (IRT) is in attitudinal and health-outcomes survey applications, often with polytomous items. In particular, there is interest in computer adaptive testing (CAT). Meeting model assumptions is necessary to realize the benefits of IRT in this setting, however. Although initial investigations of…

  12. The influence of labels associated with anchor points of Likert-type response scales in survey questionnaires.

    Science.gov (United States)

    Blais, Jean-Guy; Grondin, Julie

    2011-01-01

    Survey questionnaires are among the most used data gathering techniques in the social sciences researchers' toolbox and many factors can influence respondents' answers on items and affect data validity. Among these factors, research has accumulated which demonstrates that verbal and numeric labels associated with item's response categories in such questionnaire may influence substantially the way in which respondents operate their choices within the proposed response format. In line with these findings, the focus of this article is to use Andrich's Rating scale model to illustrate what kind of influence the quantifier adverb "totally," used to label or emphasize extreme categories, could have on respondents' answers.

  13. Average vs item response theory scores: an illustration using neighbourhood measures in relation to physical activity in adults with arthritis.

    Science.gov (United States)

    Mielenz, T J; Callahan, L F; Edwards, M C

    2017-01-01

    Our study had two main objectives: 1) to determine whether perceived neighbourhood physical features are associated with physical activity levels in adults with arthritis; and 2) to determine whether the conclusions are more precise when item response theory (IRT) scores are used instead of average scores for the perceived neighbourhood physical features scales. Information on health outcomes, neighbourhood characteristics, and physical activity levels were collected using a telephone survey of 937 participants with self-reported arthritis. Neighbourhood walkability and aesthetic features and physical activity levels were measured by self-report. Adjusted proportional odds models were constructed separately for each neighbourhood physical features scale. We found that among adults with arthritis, poorer perceived neighbourhood physical features (both walkability and aesthetics) are associated with decreased physical activity level compared to better perceived neighbourhood features. This association was only observed in our adjusted models when IRT scoring was employed with the neighbourhood physical feature scales (walkability scale: odds ratio [OR] 1.20, 95% confidence interval [CI] 1.02, 1.41; aesthetics scale: OR 1.32, 95% CI 1.09, 1.62), not when average scoring was used (walkability scale: OR 1.14, 95% CI 1.00, 1.30; aesthetics scale: OR 1.16, 95% CI 1.00, 1.36). In adults with arthritis, those reporting poorer walking and aesthetics features were found to have decreased physical activity levels compared to those reporting better features when IRT scores were used, but not when using average scores. This study may inform public health physical environmental interventions implemented to increase physical activity, especially since arthritis prevalence is expected to be close to 20% of the population in 2020. Based on NIH initiatives, future health research will utilize IRT scores. The differences found in this study may be a precursor for research on how past

  14. Mortality and health-related quality of life in prevalent dialysis patients: Comparison between 12-items and 36-items short-form health survey

    Directory of Open Access Journals (Sweden)

    Østhus Tone Brit

    2012-05-01

    Full Text Available Abstract Background To assess health- related quality of life (HRQOL with SF-12 and SF-36 and compare their abilities to predict mortality in chronic dialysis patients, after adjusting for traditional risk factors. Methods The Short-Form Health Survey (SF-36 with the embedded SF-12 was applied in 301 dialysis patients cross-sectionally. Physical and mental component summary (PCS-36, MCS-36, PCS-12, and MCS-12 scores were calculated. Clinical and demographic data were collected. Mortality (followed for up to 4.5 years was analyzed with Kaplan Meier plots and Cox proportional hazards, after censoring for renal transplantation. Exclusion factors were observation time Results In 252 patients (60.2 ± 15.5 years, 65.9% males, dialysis vintage 9.0, IQR 5.0-23.0 months, mortality during follow-up was 33.7%.(85 deaths. Significant correlations were observed between PCS-36 and PCS-12 (ρ = 0.93, p ρ = 0.95, p χ2 = 15.3, p = 0.002 and PCS-36 (χ2 = 16.7, p = 0.001. MCS was not associated with mortality. Adjusted hazard ratios for mortality were 2.5 (95% CI 1.0-6.3, PCS-12 and 2.7 (1.1 – 6.4, PCS-36 for the lowest compared with the highest (“best perceived” quartile of PCS. Conclusion Compromised HRQOL is an independent predictor of poor outcome in dialysis patients. The SF-12 provided similar predictions of mortality as SF-36, and may serve as an applicable clinical tool because it requires less time to complete.

  15. Dropout Rates and Response Times of an Occupation Search Tree in a Web Survey

    Directory of Open Access Journals (Sweden)

    Tijdens Kea

    2014-03-01

    Full Text Available Occupation is key in socioeconomic research. As in other survey modes, most web surveys use an open-ended question for occupation, though the absence of interviewers elicits unidentifiable or aggregated responses. Unlike other modes, web surveys can use a search tree with an occupation database. They are hardly ever used, but this may change due to technical advancements. This article evaluates a three-step search tree with 1,700 occupational titles, used in the 2010 multilingual WageIndicator web survey for UK, Belgium and Netherlands (22,990 observations. Dropout rates are high; in Step 1 due to unemployed respondents judging the question not to be adequate, and in Step 3 due to search tree item length. Median response times are substantial due to search tree item length, dropout in the next step and invalid occupations ticked. Overall the validity of the occupation data is rather good, 1.7-7.5% of the respondents completing the search tree have ticked an invalid occupation.

  16. Assessing Goodness of Fit in Item Response Theory with Nonparametric Models: A Comparison of Posterior Probabilities and Kernel-Smoothing Approaches

    Science.gov (United States)

    Sueiro, Manuel J.; Abad, Francisco J.

    2011-01-01

    The distance between nonparametric and parametric item characteristic curves has been proposed as an index of goodness of fit in item response theory in the form of a root integrated squared error index. This article proposes to use the posterior distribution of the latent trait as the nonparametric model and compares the performance of an index…

  17. Estimation of a Ramsay-Curve Item Response Theory Model by the Metropolis-Hastings Robbins-Monro Algorithm. CRESST Report 834

    Science.gov (United States)

    Monroe, Scott; Cai, Li

    2013-01-01

    In Ramsay curve item response theory (RC-IRT, Woods & Thissen, 2006) modeling, the shape of the latent trait distribution is estimated simultaneously with the item parameters. In its original implementation, RC-IRT is estimated via Bock and Aitkin's (1981) EM algorithm, which yields maximum marginal likelihood estimates. This method, however,…

  18. Recovery of Item Parameters in the Nominal Response Model: A Comparison of Marginal Maximum Likelihood Estimation and Markov Chain Monte Carlo Estimation.

    Science.gov (United States)

    Wollack, James A.; Bolt, Daniel M.; Cohen, Allan S.; Lee, Young-Sun

    2002-01-01

    Compared the quality of item parameter estimates for marginal maximum likelihood (MML) and Markov Chain Monte Carlo (MCMC) with the nominal response model using simulation. The quality of item parameter recovery was nearly identical for MML and MCMC, and both methods tended to produce good estimates. (SLD)

  19. Comparison of Multidimensional Item Response Models: Multivariate Normal Ability Distributions versus Multivariate Polytomous Ability Distributions. Research Report. ETS RR-08-45

    Science.gov (United States)

    Haberman, Shelby J.; von Davier, Matthias; Lee, Yi-Hsuan

    2008-01-01

    Multidimensional item response models can be based on multivariate normal ability distributions or on multivariate polytomous ability distributions. For the case of simple structure in which each item corresponds to a unique dimension of the ability vector, some applications of the two-parameter logistic model to empirical data are employed to…

  20. Increasing Response Rates to Web-Based Surveys

    Science.gov (United States)

    Monroe, Martha C.; Adams, Damian C.

    2012-01-01

    We review a popular method for collecing data--Web-based surveys. Although Web surveys are popular, one major concern is their typically low response rates. Using the Dillman et al. (2009) approach, we designed, pre-tested, and implemented a survey on climate change with Extension professionals in the Southeast. The Dillman approach worked well,…

  1. Testing whether the DSM-5 personality disorder trait model can be measured with a reduced set of items: An item response theory investigation of the Personality Inventory for DSM-5.

    Science.gov (United States)

    Maples, Jessica L; Carter, Nathan T; Few, Lauren R; Crego, Cristina; Gore, Whitney L; Samuel, Douglas B; Williamson, Rachel L; Lynam, Donald R; Widiger, Thomas A; Markon, Kristian E; Krueger, Robert F; Miller, Joshua D

    2015-12-01

    The fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) includes an alternative model of personality disorders (PDs) in Section III, consisting in part of a pathological personality trait model. To date, the 220-item Personality Inventory for DSM-5 (PID-5; Krueger, Derringer, Markon, Watson, & Skodol, 2012) is the only extant self-report instrument explicitly developed to measure this pathological trait model. The present study used item response theory-based analyses in a large sample (n = 1,417) to investigate whether a reduced set of 100 items could be identified from the PID-5 that could measure the 25 traits and 5 domains. This reduced set of PID-5 items was then tested in a community sample of adults currently receiving psychological treatment (n = 109). Across a wide range of criterion variables including NEO PI-R domains and facets, DSM-5 Section II PD scores, and externalizing and internalizing outcomes, the correlational profiles of the original and reduced versions of the PID-5 were nearly identical (rICC = .995). These results provide strong support for the hypothesis that an abbreviated set of PID-5 items can be used to reliably, validly, and efficiently assess these personality disorder traits. The ability to assess the DSM-5 Section III traits using only 100 items has important implications in that it suggests these traits could still be measured in settings in which assessment-related resources (e.g., time, compensation) are limited.

  2. 单维项目因素分析:CCFA与IRT估计方法的比较%Unidimensional Item Factor Analysis: A Comparison of Categorical Confirmation Factor Analysis and the Item Response Theory

    Institute of Scientific and Technical Information of China (English)

    刘红云; 李美娟; 骆方; 李小山

    2012-01-01

    通过两个模拟研究,比较了SEM框架下WLSc和MWLSc估计方法与IRT框架下MML/EM估计方法的差异。研究结果表明:(1)三种方法中,WLSc参数估计的偏差最大,MWLSc和MML/EM估计方法相差不大;(2)随着样本量增大,各种项目参数估计的精度均提高;(3)项目因素载荷和难度估计的精度受测验长度的影响;(4)项目因素载荷和区分度估计的精度受其总体参数高低的影响;(5)测验项目中阈值的分布会影响参数估计的精度,其中受影响最大的是项目区分度。(6)总体来看,SEM框架下的项目参数估计精度较IRT框架下项目参数估计的精度高。%The factor analysis models and estimation methods for continuous (i. e. , interval or ratio scale) data are not appropriate for item-level data that are categorical in nature. The authors provided a brief review and synthesis of the item factor analysis estimation literature for categorical data (e. g. , 0-1 type response scales). Popular categorical item factor analysis models and estimation methods found in the structural equation modeling and item response theory literature were presented. Two Monte Carlo simulation studies were conducted and revealed: (1) Similar parameter estimates have been obtained from the SEM and IRT parameterizations. Even with a small sample and the IRT estimates converted to SEM parameters, the MWLSc, and MMIJEM results were found to be strikingly similar. But in a small sample size and long tests WLSc did not obtain the convergence parameter estimations. Although in short tests WLSc estimates obtained them, the estimates were consistently more discrepant than those yielded by the other estimation techniques. (2) The precision of the estimators enhanced as the quantity of the sample increased. (3) The precision of item factor load and of item difficulty parameter was influenced by the test length. (4) The precision of

  3. Resistance to Confounding Style and Content in Scoring Constructed-Response Items

    Science.gov (United States)

    Schafer, William D.; Gagne, Phill; Lissitz, Robert W.

    2005-01-01

    An assumption that is fundamental to the scoring of student-constructed responses (e.g., essays) is the ability of raters to focus on the response characteristics of interest rather than on other features. A common example, and the focus of this study, is the ability of raters to score a response based on the content achievement it demonstrates…

  4. What range of trait levels can the Autism-Spectrum Quotient (AQ) measure reliably? An item response theory analysis.

    Science.gov (United States)

    Murray, Aja Louise; Booth, Tom; McKenzie, Karen; Kuenssberg, Renate

    2016-06-01

    It has previously been noted that inventories measuring traits that originated in a psychopathological paradigm can often reliably measure only a very narrow range of trait levels that are near and above clinical cutoffs. Much recent work has, however, suggested that autism spectrum disorder traits are on a continuum of severity that extends well into the nonclinical range. This implies a need for inventories that can capture individual differences in autistic traits from very high levels all the way to the opposite end of the continuum. The Autism-Spectrum Quotient (AQ) was developed based on a closely related rationale, but there has, to date, been no direct test of the range of trait levels that the AQ can reliably measure. To assess this, we fit a bifactor item response theory model to the AQ. Results suggested that AQ measures moderately low to moderately high levels of a general autistic trait with good measurement precision. The reliable range of measurement was significantly improved by scoring the instrument using its 4-point response scale, rather than dichotomizing responses. These results support the use of the AQ in nonclinical samples, but suggest that items measuring very low and very high levels of autistic traits would be beneficial additions to the inventory. (PsycINFO Database Record

  5. Item Response Theory Analysis of Two Questionnaire Measures of Arthritis-Related Self-Efficacy Beliefs from Community-Based US Samples

    Directory of Open Access Journals (Sweden)

    Thelma J. Mielenz

    2010-01-01

    Full Text Available Using item response theory (IRT, we examined the Rheumatoid Arthritis Self-efficacy scale (RASE collected from a People with Arthritis Can Exercise RCT (346 participants and 2 subscales of the Arthritis Self-efficacy scale (ASE collected from an Active Living Every Day (ALED RCT (354 participants to determine which one better identifies low arthritis self-efficacy in community-based adults with arthritis. The item parameters were estimated in Multilog using the graded response model. The 2 ASE subscales are adequately explained by one factor. There was evidence for 2 locally dependent item pairs; two items from these pairs were removed when we reran the model. The exploratory factor analysis results for RASE showed a multifactor solution which led to a 9-factor solution. In order to perform IRT analysis, one item from each of the 9 subfactors was selected. Both scales were effective at measuring a range of arthritis SE.

  6. Item Banking with Embedded Standards

    Science.gov (United States)

    MacCann, Robert G.; Stanley, Gordon

    2009-01-01

    An item banking method that does not use Item Response Theory (IRT) is described. This method provides a comparable grading system across schools that would be suitable for low-stakes testing. It uses the Angoff standard-setting method to obtain item ratings that are stored with each item. An example of such a grading system is given, showing how…

  7. Item Banking with Embedded Standards

    Science.gov (United States)

    MacCann, Robert G.; Stanley, Gordon

    2009-01-01

    An item banking method that does not use Item Response Theory (IRT) is described. This method provides a comparable grading system across schools that would be suitable for low-stakes testing. It uses the Angoff standard-setting method to obtain item ratings that are stored with each item. An example of such a grading system is given, showing how…

  8. Using item response theory to explore the psychometric properties of extended matching questions examination in undergraduate medical education

    Directory of Open Access Journals (Sweden)

    Lawton Gemma

    2005-03-01

    Full Text Available Abstract Background As assessment has been shown to direct learning, it is critical that the examinations developed to test clinical competence in medical undergraduates are valid and reliable. The use of extended matching questions (EMQ has been advocated to overcome some of the criticisms of using multiple-choice questions to test factual and applied knowledge. Methods We analysed the results from the Extended Matching Questions Examination taken by 4th year undergraduate medical students in the academic year 2001 to 2002. Rasch analysis was used to examine whether the set of questions used in the examination mapped on to a unidimensional scale, the degree of difficulty of questions within and between the various medical and surgical specialties and the pattern of responses within individual questions to assess the impact of the distractor options. Results Analysis of a subset of items and of the full examination demonstrated internal construct validity and the absence of bias on the majority of questions. Three main patterns of response selection were identified. Conclusion Modern psychometric methods based upon the work of Rasch provide a useful approach to the calibration and analysis of EMQ undergraduate medical assessments. The approach allows for a formal test of the unidimensionality of the questions and thus the validity of the summed score. Given the metric calibration which follows fit to the model, it also allows for the establishment of items banks to facilitate continuity and equity in exam standards.

  9. Psychometric Properties and Responsiveness to Change of 15- and 28-Item Versions of the SCORE: A Family Assessment Questionnaire.

    Science.gov (United States)

    Hamilton, Elena; Carr, Alan; Cahill, Paul; Cassells, Ciara; Hartnett, Dan

    2015-09-01

    The SCORE (Systemic Clinical Outcome and Routine Evaluation) is a 40-item questionnaire for completion by family members 12 years and older to assess outcome in systemic therapy. This study aimed to investigate psychometric properties of two short versions of the SCORE and their responsiveness to therapeutic change. Data were collected at 19 centers from 701 families at baseline and from 433 of these 3-5 months later. Results confirmed the three-factor structure (strengths, difficulties, and communication) of the 15- and 28-item versions of the SCORE. Both instruments had good internal consistency and test-retest reliability. They also showed construct and criterion validity, correlating with measures of parent, child, and family adjustment, and discriminating between clinical and nonclinical cases. Total and factor scales of the SCORE-15 and -28 were responsive to change over 3-5 months of therapy. The SCORE-15 and SCORE-28 are brief psychometrically robust family assessment instruments which may be used to evaluate systemic therapy. © 2015 Family Process Institute.

  10. Impact of different scoring algorithms applied to multiple-mark survey items on outcome assessment: an in-field study on health-related knowledge.

    Science.gov (United States)

    Domnich, A; Panatto, D; Arata, L; Bevilacqua, I; Apprato, L; Gasparini, R; Amicizia, D

    2015-01-01

    Health-related knowledge is often assessed through multiple-choice tests. Among the different types of formats, researchers may opt to use multiple-mark items, i.e. with more than one correct answer. Although multiple-mark items have long been used in the academic setting - sometimes with scant or inconclusive results - little is known about the implementation of this format in research on in-field health education and promotion. A study population of secondary school students completed a survey on nutrition-related knowledge, followed by a single- lecture intervention. Answers were scored by means of eight different scoring algorithms and analyzed from the perspective of classical test theory. The same survey was re-administered to a sample of the students in order to evaluate the short-term change in their knowledge. In all, 286 questionnaires were analyzed. Partial scoring algorithms displayed better psychometric characteristics than the dichotomous rule. In particular, the algorithm proposed by Ripkey and the balanced rule showed greater internal consistency and relative efficiency in scoring multiple-mark items. A penalizing algorithm in which the proportion of marked distracters was subtracted from that of marked correct answers was the only one that highlighted a significant difference in performance between natives and immigrants, probably owing to its slightly better discriminatory ability. This algorithm was also associated with the largest effect size in the pre-/post-intervention score change. The choice of an appropriate rule for scoring multiple- mark items in research on health education and promotion should consider not only the psychometric properties of single algorithms but also the study aims and outcomes, since scoring rules differ in terms of biasness, reliability, difficulty, sensitivity to guessing and discrimination.

  11. The Effect of Response Format on the Psychometric Properties of the Narcissistic Personality Inventory: Consequences for Item Meaning and Factor Structure.

    Science.gov (United States)

    Ackerman, Robert A; Donnellan, M Brent; Roberts, Brent W; Fraley, R Chris

    2016-04-01

    The Narcissistic Personality Inventory (NPI) is currently the most widely used measure of narcissism in social/personality psychology. It is also relatively unique because it uses a forced-choice response format. We investigate the consequences of changing the NPI's response format for item meaning and factor structure. Participants were randomly assigned to one of three conditions: 40 forced-choice items (n = 2,754), 80 single-stimulus dichotomous items (i.e., separate true/false responses for each item; n = 2,275), or 80 single-stimulus rating scale items (i.e., 5-point Likert-type response scales for each item; n = 2,156). Analyses suggested that the "narcissistic" and "nonnarcissistic" response options from the Entitlement and Superiority subscales refer to independent personality dimensions rather than high and low levels of the same attribute. In addition, factor analyses revealed that although the Leadership dimension was evident across formats, dimensions with entitlement and superiority were not as robust. Implications for continued use of the NPI are discussed.

  12. Why Japanese workers show low work engagement: An item response theory analysis of the Utrecht Work Engagement scale

    Directory of Open Access Journals (Sweden)

    Iwata Noboru

    2010-11-01

    Full Text Available Abstract With the globalization of occupational health psychology, more and more researchers are interested in applying employee well-being like work engagement (i.e., a positive, fulfilling, work-related state of mind that is characterized by vigor, dedication, and absorption to diverse populations. Accurate measurement contributes to our further understanding and to the generalizability of the concept of work engagement across different cultures. The present study investigated the measurement accuracy of the Japanese and the original Dutch versions of the Utrecht Work Engagement Scale (9-item version, UWES-9 and the comparability of this scale between both countries. Item Response Theory (IRT was applied to the data from Japan (N = 2,339 and the Netherlands (N = 13,406. Reliability of the scale was evaluated at various levels of the latent trait (i.e., work engagement based the test information function (TIF and the standard error of measurement (SEM. The Japanese version had difficulty in differentiating respondents with extremely low work engagement, whereas the original Dutch version had difficulty in differentiating respondents with high work engagement. The measurement accuracy of both versions was not similar. Suppression of positive affect among Japanese people and self-enhancement (the general sensitivity to positive self-relevant information among Dutch people may have caused decreased measurement accuracy. Hence, we should be cautious when interpreting low engagement scores among Japanese as well as high engagement scores among western employees.

  13. Why Japanese workers show low work engagement: An item response theory analysis of the Utrecht Work Engagement scale.

    Science.gov (United States)

    Shimazu, Akihito; Schaufeli, Wilmar B; Miyanaka, Daisuke; Iwata, Noboru

    2010-11-05

    With the globalization of occupational health psychology, more and more researchers are interested in applying employee well-being like work engagement (i.e., a positive, fulfilling, work-related state of mind that is characterized by vigor, dedication, and absorption) to diverse populations. Accurate measurement contributes to our further understanding and to the generalizability of the concept of work engagement across different cultures. The present study investigated the measurement accuracy of the Japanese and the original Dutch versions of the Utrecht Work Engagement Scale (9-item version, UWES-9) and the comparability of this scale between both countries. Item Response Theory (IRT) was applied to the data from Japan (N = 2,339) and the Netherlands (N = 13,406). Reliability of the scale was evaluated at various levels of the latent trait (i.e., work engagement) based the test information function (TIF) and the standard error of measurement (SEM). The Japanese version had difficulty in differentiating respondents with extremely low work engagement, whereas the original Dutch version had difficulty in differentiating respondents with high work engagement. The measurement accuracy of both versions was not similar. Suppression of positive affect among Japanese people and self-enhancement (the general sensitivity to positive self-relevant information) among Dutch people may have caused decreased measurement accuracy. Hence, we should be cautious when interpreting low engagement scores among Japanese as well as high engagement scores among western employees.

  14. A new look at the psychometrics of the parenting scale through the lens of item response theory.

    Science.gov (United States)

    Lorber, Michael F; Xu, Shu; Slep, Amy M Smith; Bulling, Lisanne; O'Leary, Susan G

    2014-01-01

    The psychometrics of the Parenting Scale's Overreactivity and Laxness subscales were evaluated using item response theory (IRT) techniques. The IRT analyses were based on 2 community samples of cohabiting parents of 3- to 8-year-old children, combined to yield a total sample size of 852 families. The results supported the utility of the Overreactivity and Laxness subscales, particularly in discriminating among parents in the mid to upper reaches of each construct. The original versions of the Overreactivity and Laxness subscales were more reliable than alternative, shorter versions identified in replicated factor analyses from previously published research and in IRT analyses in the present research. Moreover, in several cases, the original versions of these subscales, in comparison with the shortened versions, exhibited greater 6-month stabilities and correlations with child externalizing behavior and couple relationship satisfaction. Reliability was greater for the Laxness than for the Overreactivity subscale. Item performance on each subscale was highly variable. Together, the present findings are generally supportive of the psychometrics of the Parenting Scale, particularly for clinical research and practice. They also suggest areas for further development.

  15. Performance of Accounting students on the Enade/2012 test: an application of the Item-Response Theory

    Directory of Open Access Journals (Sweden)

    Raphael Vinicius Weigert Camargo

    2016-08-01

    Full Text Available The objective in this study was to measure Accounting students’ performance (proficiency on the Enade test using the Item Response Theory (IRT. The students’ performance was measured using the three parameter logistic model (3PL, based on data related to the Enade test/2012, taken from the website of the National Institute for Educational Studies and Research Anísio Teixeira (Inep, concerning 47,098 students. Through the scale, three levels of student performance could be distinguished. Level 1 students master the reading and interpretation of texts and quantitative reasoning. In addition, Level 2 students should present logical reasoning and systemic and holistic perspective. Furthermore, at Level 3, students should present interdisciplinary knowledge, covering accounting contents, critical-analytic skills and practical application of the content mastered. The results also appointed that the items of the Enade test were very difficulty for the group that took the test. Independently of the student characteristics analyzed, overall, the proficiency scores were very low. This result suggests that the HEI need to take actions and that public policies are needed that can contribute to improve the students’ performance.

  16. Disfluencies and gaze aversion in unreliable responses to survey questions

    NARCIS (Netherlands)

    Schober, Michael F.; Conrad, Frederick G.; Dijkstra, Wil; Ongena, Yfke P.

    2012-01-01

    When survey respondents answer survey questions, they can also produce "paradata" (Couper 2000, 2008): behavioral evidence about their response process. The study reported here demonstrates that two kinds of respondent paradata - fluency of speech and gaze direction during answers - identify answers

  17. Living with Smartphones: Does Completion Device Affect Survey Responses?

    Science.gov (United States)

    Lambert, Amber D.; Miller, Angie L.

    2015-01-01

    With the growing reliance on tablets and smartphones for internet access, understanding the effects of completion device on online survey responses becomes increasing important. This study uses data from the Strategic National Arts Alumni Project, a multi-institution online alumni survey designed to obtain knowledge of arts education, to explore…

  18. Living with Smartphones: Does Completion Device Affect Survey Responses?

    Science.gov (United States)

    Lambert, Amber D.; Miller, Angie L.

    2015-01-01

    With the growing reliance on tablets and smartphones for internet access, understanding the effects of completion device on online survey responses becomes increasing important. This study uses data from the Strategic National Arts Alumni Project, a multi-institution online alumni survey designed to obtain knowledge of arts education, to explore…

  19. A Comparison of Web-Based and Paper-Based Survey Methods: Testing Assumptions of Survey Mode and Response Cost

    Science.gov (United States)

    Greenlaw, Corey; Brown-Welty, Sharon

    2009-01-01

    Web-based surveys have become more prevalent in areas such as evaluation, research, and marketing research to name a few. The proliferation of these online surveys raises the question, how do response rates compare with traditional surveys and at what cost? This research explored response rates and costs for Web-based surveys, paper surveys, and…

  20. A Comparison of Web-Based and Paper-Based Survey Methods: Testing Assumptions of Survey Mode and Response Cost

    Science.gov (United States)

    Greenlaw, Corey; Brown-Welty, Sharon

    2009-01-01

    Web-based surveys have become more prevalent in areas such as evaluation, research, and marketing research to name a few. The proliferation of these online surveys raises the question, how do response rates compare with traditional surveys and at what cost? This research explored response rates and costs for Web-based surveys, paper surveys, and…

  1. Psychometric analysis of Stöber's social desirability scale (SDS-17): an item response theory perspective.

    Science.gov (United States)

    Tran, Ulrich S; Stieger, Stefan; Voracek, Martin

    2012-12-01

    Stöber's Social Desirability Scale (SDS-17) was examined psychometrically in 5 samples (N=2817) from Austria, Canada, and the U.S.A. Rasch and Mokken scaling analyses attested the SDS-17 is not strictly unidimensional. Age, agreeableness, and conscientiousness were notable positive correlates of SDS-17 scores. There were signs of non-normal score distributions, acquiescence bias, and sex and country differences (higher scores among Austrians than North Americans). Items with higher ratings of social desirability according to previous research were particularly prone to show sex effects. The SDS-17 appears suitable in cross-cultural settings, but may benefit from substituting its true-false response format with a rating-scale format. A formative-indicators view regarding the social desirability construct and the SDS-17 is discussed.

  2. HOW WE HAVE USED ITEM RESPONSE THEORY AND CLASSROOM MANAGEMENT TO IMPROVE STUDENT SUCCESS RATES IN LARGE GENERAL CHEMISTRY CLASSES

    Directory of Open Access Journals (Sweden)

    Brock L. Casselman

    Full Text Available Since 2012 we have tracked general chemistry student success rates at the University of Utah. In efforts to improve those rates we have implemented math prerequisites, changed our discussion session format, installed some metacognitive exercises aimed at the lowest quartile of students and instituted a flipped classroom model. Furthermore, using Item Response Theory we have identified what topics each individual student struggles with on practice tests. These steps have increased our success rates to ~76%. As well, student performance on nationally normed American Chemical Society final exams has improved to a median of 86 percentile. Our lowest quartile of students in spring 2016 scored at the 51 st percentile, above the national median.

  3. The Elements of Item Response Theory and its Framework in Analyzing Introductory Astronomy College Student Misconceptions. I. Galaxies

    CERN Document Server

    Favia, Andrej; Thorpe, Geoffrey L

    2013-01-01

    This is the first in a series of papers that analyze college student beliefs in realms where common astronomy misconceptions are prevalent. Data was collected through administration of an inventory distributed at the end of an introductory college astronomy course. In this paper, we present the basic mathematics of item response theory (IRT), and then we use it to explore concepts related to galaxies. We show how IRT determines the difficulty of each galaxy topic under consideration. We find that the concept of galaxy spatial distribution presents the greatest challenge to students of all the galaxy topics. We also find and present the most logical sequence to teach galaxy topics as a function of the audience's age.

  4. The Children's Behavior Questionnaire very short scale: psychometric properties and development of a one-item temperament scale.

    Science.gov (United States)

    Sleddens, Ester F C; Hughes, Sheryl O; O'Connor, Teresia M; Beltran, Alicia; Baranowski, Janice C; Nicklas, Theresa A; Baranowski, Tom

    2012-02-01

    Little research has been conducted on the psychometrics of the very short scale (36 items) of the Children's Behavior Questionnaire, and no one-item temperament scale has been tested for use in applied work. In this study, 237 United States caregivers completed a survey to define their child's behavioral patterns (i.e., Surgency, Negative Affectivity Effortful Control) using both scales. Psychometrics of the 36-item Children's Behavior Questionnaire were examined using classical test theory, principal factor analysis, and item response modeling. Classical test theory analysis demonstrated adequate internal consistency and factor analysis confirmed a three-factor structure. Potential improvements to the measure were identified using item response modeling. A one-item (three response categories) temperament scale was validated against the three temperament factors of the 36-item scale. The temperament response categories correlated with the temperament factors of the 36-item scale, as expected. The one-item temperament scale may be applicable for clinical use.

  5. Multilevel Modeling of Item Position Effects

    Science.gov (United States)

    Albano, Anthony D.

    2013-01-01

    In many testing programs it is assumed that the context or position in which an item is administered does not have a differential effect on examinee responses to the item. Violations of this assumption may bias item response theory estimates of item and person parameters. This study examines the potentially biasing effects of item position. A…

  6. Comparison of response patterns in different survey designs: a longitudinal panel with mixed-mode and online-only design.

    Science.gov (United States)

    Rübsamen, Nicole; Akmatov, Manas K; Castell, Stefanie; Karch, André; Mikolajczyk, Rafael T

    2017-01-01

    Increasing availability of the Internet allows using only online data collection for more epidemiological studies. We compare response patterns in a population-based health survey using two survey designs: mixed-mode (choice between paper-and-pencil and online questionnaires) and online-only design (without choice). We used data from a longitudinal panel, the Hygiene and Behaviour Infectious Diseases Study (HaBIDS), conducted in 2014/2015 in four regions in Lower Saxony, Germany. Individuals were recruited using address-based probability sampling. In two regions, individuals could choose between paper-and-pencil and online questionnaires. In the other two regions, individuals were offered online-only participation. We compared sociodemographic characteristics of respondents who filled in all panel questionnaires between the mixed-mode group (n = 1110) and the online-only group (n = 482). Using 134 items, we performed multinomial logistic regression to compare responses between survey designs in terms of type (missing, "do not know" or valid response) and ordinal regression to compare responses in terms of content. We applied the false discovery rates (FDR) to control for multiple testing and investigated effects of adjusting for sociodemographic characteristic. For validation of the differential response patterns between mixed-mode and online-only, we compared the response patterns between paper and online mode among the respondents in the mixed-mode group in one region (n = 786). Respondents in the online-only group were older than those in the mixed-mode group, but both groups did not differ regarding sex or education. Type of response did not differ between the online-only and the mixed-mode group. Survey design was associated with different content of response in 18 of the 134 investigated items; which decreased to 11 after adjusting for sociodemographic variables. In the validation within the mixed-mode, only two of those were among the 11 significantly

  7. Measuring the Accuracy of Survey Responses using Administrative Register Data

    DEFF Research Database (Denmark)

    Kreiner, Claus Thustrup; Lassen, David Dreyer; Leth-Petersen, Søren

    2015-01-01

    This paper shows how Danish administrative register data can be combined with survey data at the person level and be used to validate information collected in the survey. Register data are collected by automatic third party reporting and the potential errors associated with the two data sources...... are therefore plausibly orthogonal. Two examples are given to illustrate the potential of combining survey and register data. In the first example expenditure survey records with information about total expenditure are merged with income tax records holding information about income and wealth. Income and wealth...... data are used to impute total expenditure which is then compared to the survey measure. Results suggest that the two measures match each other well on average. In the second example we compare responses to a one-shot recall question about total gross personal income ¿collected in another survey...

  8. Working with Missing Data: Imputation of Nonresponse Items in Categorical Survey Data with a Non-Monotone Missing Pattern

    OpenAIRE

    Wilson, Machelle D; Kerstin Lueck

    2014-01-01

    The imputation of missing data is often a crucial step in the analysis of survey data. This study reviews typical problems with missing data and discusses a method for the imputation of missing survey data with a large number of categorical variables which do not have a monotone missing pattern. We develop a method for constructing a monotone missing pattern that allows for imputation of categorical data in data sets with a large number of variables using a model-based MCMC approach. We repor...

  9. In Search of Motivation for the Business Survey Response Task

    NARCIS (Netherlands)

    Torres van Grinsven, Vanessa; Bolko, Irena; Bavdaz, Mojca

    2014-01-01

    Increasing reluctance of businesses to participate in surveys often leads to declining or low response rates, poor data quality and burden complaints, and suggests that a driving force, that is, the motivation for participation and accurate and timely response, is insufficient or lacking. Inspiratio

  10. In Search of Motivation for the Business Survey Response Task

    NARCIS (Netherlands)

    Torres van Grinsven, Vanessa; Bolko, Irena; Bavdaz, Mojca

    2014-01-01

    Increasing reluctance of businesses to participate in surveys often leads to declining or low response rates, poor data quality and burden complaints, and suggests that a driving force, that is, the motivation for participation and accurate and timely response, is insufficient or lacking.

  11. Maximizing measurement efficiency of behavior rating scales using Item Response Theory: An example with the Social Skills Improvement System - Teacher Rating Scale.

    Science.gov (United States)

    Anthony, Christopher J; DiPerna, James C; Lei, Pui-Wa

    2016-04-01

    Measurement efficiency is an important consideration when developing behavior rating scales for use in research and practice. Although most published scales have been developed within a Classical Test Theory (CTT) framework, Item Response Theory (IRT) offers several advantages for developing scales that maximize measurement efficiency. The current study provides an example of using IRT to maximize rating scale efficiency with the Social Skills Improvement System - Teacher Rating Scale (SSIS - TRS), a measure of student social skills frequently used in practice and research. Based on IRT analyses, 27 items from the Social Skills subscales and 14 items from the Problem Behavior subscales of the SSIS - TRS were identified as maximally efficient. In addition to maintaining similar content coverage to the published version, these sets of maximally efficient items demonstrated similar psychometric properties to the published SSIS - TRS.

  12. Item Randomized-Response Models for Measuring Noncompliance: Risk-Return Perceptions, Social Influences, and Self-Protective Responses

    Science.gov (United States)

    Bockenholt, Ulf; Van Der Heijden, Peter G. M.

    2007-01-01

    Randomized response (RR) is a well-known method for measuring sensitive behavior. Yet this method is not often applied because: (i) of its lower efficiency and the resulting need for larger sample sizes which make applications of RR costly; (ii) despite its privacy-protection mechanism the RR design may not be followed by every respondent; and…

  13. Responses to catastrophic AGI risk: a survey

    Science.gov (United States)

    Sotala, Kaj; Yampolskiy, Roman V.

    2015-01-01

    Many researchers have argued that humanity will create artificial general intelligence (AGI) within the next twenty to one hundred years. It has been suggested that AGI may inflict serious damage to human well-being on a global scale (‘catastrophic risk’). After summarizing the arguments for why AGI may pose such a risk, we review the field's proposed responses to AGI risk. We consider societal proposals, proposals for external constraints on AGI behaviors and proposals for creating AGIs that are safe due to their internal design.

  14. Psychometric properties of the SDM-Q-9 questionnaire for shared decision-making in multiple sclerosis: item response theory modelling and confirmatory factor analysis.

    Science.gov (United States)

    Ballesteros, Javier; Moral, Ester; Brieva, Luis; Ruiz-Beato, Elena; Prefasi, Daniel; Maurino, Jorge

    2017-04-22

    Shared decision-making is a cornerstone of patient-centred care. The 9-item Shared Decision-Making Questionnaire (SDM-Q-9) is a brief self-assessment tool for measuring patients' perceived level of involvement in decision-making related to their own treatment and care. Information related to the psychometric properties of the SDM-Q-9 for multiple sclerosis (MS) patients is limited. The objective of this study was to assess the performance of the items composing the SDM-Q-9 and its dimensional structure in patients with relapsing-remitting MS. A non-interventional, cross-sectional study in adult patients with relapsing-remitting MS was conducted in 17 MS units throughout Spain. A nonparametric item response theory (IRT) analysis was used to assess the latent construct and dimensional structure underlying the observed responses. A parametric IRT model, General Partial Credit Model, was fitted to obtain estimates of the relationship between the latent construct and item characteristics. The unidimensionality of the SDM-Q-9 instrument was assessed by confirmatory factor analysis. A total of 221 patients were studied (mean age = 42.1 ± 9.9 years, 68.3% female). Median Expanded Disability Status Scale score was 2.5 ± 1.5. Most patients reported taking part in each step of the decision-making process. Internal reliability of the instrument was high (Cronbach's α = 0.91) and the overall scale scalability score was 0.57, indicative of a strong scale. All items, except for the item 1, showed scalability indices higher than 0.30. Four items (items 6 through to 9) conveyed more than half of the SDM-Q-9 overall information (67.3%). The SDM-Q-9 was a good fit for a unidimensional latent structure (comparative fit index = 0.98, root-mean-square error of approximation = 0.07). All freely estimated parameters were statistically significant (P 0.40) with the exception of item 1 which presented the lowest loading (0.26). Items 6 through to 8 were the

  15. Non-response in a survey among immigrants in Denmark

    DEFF Research Database (Denmark)

    Deding, Mette; Fridberg, Torben; Jakobsen, Vibeke

    The purpose of this paper is to study how various characteristics of respondents and interviewers affect non-response among immigrants. We use a survey conducted among immigrants in Denmark and ethnic Danes. First, we analyse the determinants of overall non-response. Second, we analyse how...... the determinants of contact and of response given contact differ. We find that characteristics of the respondents are important for the response rate – especially they are important for the probability of getting in contact with the respondent. The lower probability of response among immigrants compared to ethnic...

  16. Response bias in job satisfaction surveys: English general practitioners

    OpenAIRE

    Gravelle, H.; AR Hole, I Hussein

    2008-01-01

    Job satisfaction may affect the propensity to respond to job satisfaction surveys, so that estimates of average satisfaction and the effects of determinants of satisfaction may be biased. We examine response bias using data from a postal job satisfaction survey of family doctors. We link all the sampled doctors to an administrative database and so have information on the characteristics of responders and non-responders. Allowing for selection increases the estimate of mean job satisfaction in...

  17. In Search of Motivation for the Business Survey Response Task

    Directory of Open Access Journals (Sweden)

    Torres van Grinsven Vanessa

    2014-12-01

    Full Text Available Increasing reluctance of businesses to participate in surveys often leads to declining or low response rates, poor data quality and burden complaints, and suggests that a driving force, that is, the motivation for participation and accurate and timely response, is insufficient or lacking. Inspiration for ways to remedy this situation has already been sought in the psychological theory of self-determination; previous research has favored enhancement of intrinsic motivation compared to extrinsic motivation. Traditionally however, enhancing extrinsic motivation has been pervasive in business surveys. We therefore review this theory in the context of business surveys using empirical data from the Netherlands and Slovenia, and suggest that extrinsic motivation calls for at least as much attention as intrinsic motivation, that other sources of motivation may be relevant besides those stemming from the three fundamental psychological needs (competence, autonomy and relatedness, and that other approaches may have the potential to better explain some aspects of motivation in business surveys (e.g., implicit motives. We conclude with suggestions that survey organizations can consider when attempting to improve business survey response behavior.

  18. Developmental changes in reading do not alter the development of visual processing skills: An application of explanatory item response models in grades K-2

    Directory of Open Access Journals (Sweden)

    Kristi L Santi

    2015-02-01

    Full Text Available Visual processing has been widely studied in regard to its impact on a students’ ability to read. A less researched area is the role of reading in the development of visual processing skills. A cohort-sequential, accelerated-longitudinal design was utilized with 932 kindergarten, first, and second grade students to examine the impact of reading acquisition on the processing of various types of visual discrimination and visual motor test items. Students were assessed four times per year on a variety of reading measures and reading precursors and two popular measures of visual processing over a three-year period. Explanatory item response models were used to examine the roles of person and item characteristics on changes in visual processing abilities and changes in item difficulties over time. Results showed different developmental patterns for five types of visual processing test items, but most importantly failed to show consistent effects of learning to read on changes in item difficulty. Thus, the present study failed to find support for the hypothesis that learning to read alters performance on measures of visual processing. Rather, visual processing and reading ability improved together over time with no evidence to suggest cross-domain influences from reading to visual processing. Results are discussed in the context of developmental theories of visual processing and brain-based research on the role of visual skills in learning to read.

  19. Development of Survey Scales for Measuring Exposure and Behavioral Responses to Disruptive Intraoperative Behavior.

    Science.gov (United States)

    Villafranca, Alexander; Hamlin, Colin; Rodebaugh, Thomas L; Robinson, Sandra; Jacobsohn, Eric

    2017-09-10

    Disruptive intraoperative behavior has detrimental effects to clinicians, institutions, and patients. How clinicians respond to this behavior can either exacerbate or attenuate its effects. Previous investigations of disruptive behavior have used survey scales with significant limitations. The study objective was to develop appropriate scales to measure exposure and responses to disruptive behavior. We obtained ethics approval. The scales were developed in a sequence of steps. They were pretested using expert reviews, computational linguistic analysis, and cognitive interviews. The scales were then piloted on Canadian operating room clinicians. Factor analysis was applied to half of the data set for question reduction and grouping. Item response analysis and theoretical reviews ensured that important questions were not eliminated. Internal consistency was evaluated using Cronbach α. Model fit was examined on the second half of the data set using confirmatory factor analysis. Content validity of the final scales was re-evaluated. Consistency between observed relationships and theoretical predictions was assessed. Temporal stability was evaluated on a subsample of 38 respondents. A total of 1433 and 746 clinicians completed the exposure and response scales, respectively. Content validity indices were excellent (exposure = 0.96, responses = 1.0). Internal consistency was good (exposure = 0.93, responses = 0.87). Correlations between the exposure scale and secondary measures were consistent with expectations based on theory. Temporal stability was acceptable (exposure = 0.77, responses = 0.73). We have developed scales measuring exposure and responses to disruptive behavior. They generate valid and reliable scores when surveying operating room clinicians, and they overcome the limitations of previous tools. These survey scales are freely available.

  20. Immediate List Recall as a Measure of Short-Term Episodic Memory: Insights from the Serial Position Effect and Item Response Theory

    Science.gov (United States)

    Gavett, Brandon E.; Horwitz, Julie E.

    2012-01-01

    The serial position effect shows that two interrelated cognitive processes underlie immediate recall of a supraspan word list. The current study used item response theory (IRT) methods to determine whether the serial position effect poses a threat to the construct validity of immediate list recall as a measure of verbal episodic memory. Archival data were obtained from a national sample of 4,212 volunteers aged 28–84 in the Midlife Development in the United States study. Telephone assessment yielded item-level data for a single immediate recall trial of the Rey Auditory Verbal Learning Test (RAVLT). Two parameter logistic IRT procedures were used to estimate item parameters and the Q1 statistic was used to evaluate item fit. A two-dimensional model better fit the data than a unidimensional model, supporting the notion that list recall is influenced by two underlying cognitive processes. IRT analyses revealed that 4 of the 15 RAVLT items (1, 12, 14, and 15) were misfit (p < .05). Item characteristic curves for items 14 and 15 decreased monotonically, implying an inverse relationship between the ability level and the probability of recall. Elimination of the four misfit items provided better fit to the data and met necessary IRT assumptions. Performance on a supraspan list learning test is influenced by multiple cognitive abilities; failure to account for the serial position of words decreases the construct validity of the test as a measure of episodic memory and may provide misleading results. IRT methods can ameliorate these problems and improve construct validity. PMID:22138320

  1. Desenvolvimento de uma escala para medir o potencial empreendedor utilizando a Teoria da Resposta ao Item (TRI Development of a scale to measure the entrepreneurial potential using the Item Response Theory (IRT

    Directory of Open Access Journals (Sweden)

    Luciano Ricardo Rath Alves

    2011-01-01

    Full Text Available Diversas variáveis estão relacionadas ao desenvolvimento da atividade empreendedora, verifica-se, entre elas, a importância do agente empreendedor. Dos estudos que contribuem para o seu entendimento, este segue a linha que defende que o empreendedor tem características e traços de personalidade singulares em relação à população, os quais são propícios ao sucesso do empreendedorismo. O objetivo deste trabalho é desenvolver uma escala para medir o potencial empreendedor utilizando a Teoria da Resposta ao Item. Foi utilizado o modelo logístico de dois parâmetros da TRI. As estimativas dos parâmetros foram obtidas a partir da amostra com 764 pessoas que responderam a um instrumento composto por 103 itens. A curva de informação e do erro padrão do teste e a interpretação qualitativa de níveis da escala permitiram determinar o intervalo mais apropriado para utilização do instrumento. Os resultados mostraram que a escala é mais adequada para avaliar indivíduos com baixo até moderadamente alto potencial empreendedor. Por isso, sugere-se que novos itens sejam incorporados ao instrumento para mensurar e interpretar níveis ainda mais elevados. A Teoria da Resposta ao Item permite que novos itens sejam calibrados a fim de mensurar os empreendedores com alto potencial empreendedor, aproveitando os dados já obtidos.Several variables are related to the development of entrepreneurial activities. An important one among them is the entrepreneurial agent. This study is one of many that contribute to the understanding of the entrepreneurial agent. In its line of thought, it upholds the idea that the entrepreneur has characteristics and personality traits that stand out from the general population and that are favorable to the success of the entrepreneurship. This study aims at developing a measurement scale for entrepreneurial potential using the Item Response Theory. The items were generated by Santos (2008 based on a theoretical model

  2. Proposta de um instrumento de medida para avaliar a satisfação de clientes de bancos utilizando a Teoria da Resposta ao Item Proposal of tool to assess the satisfaction of bank customers using the Item Response Theory

    Directory of Open Access Journals (Sweden)

    Alceu Balbim Junior

    2011-01-01

    Full Text Available Este artigo apresenta um instrumento de medida para avaliação da satisfação de clientes de bancos utilizando a Teoria da Resposta ao Item (TRI. Satisfazer os clientes tem sido uma busca constante das organizações que procuram manterem-se competitivas no mercado. Estudos constatam a relação entre a qualidade percebida pelos clientes, a satisfação e fidelidade. A avaliação da satisfação pode ser realizada por meio da qualidade percebida pelos clientes e a construção de ferramentas de avaliação deve contemplar características específicas da atividade em questão. Embasando-se em artigos que avaliam a satisfação de clientes de bancos, propõe-se um instrumento formado por 29 itens. Os itens foram aplicados a 240 clientes a fim de avaliar a satisfação com o banco de maior relacionamento. Utilizando a Teoria da Resposta ao Item, foram identificados os parâmetros dos itens e a curva de informação. A análise do grau de discriminação dos itens indicou que todos são apropriados. A curva de informação obtida evidenciou o intervalo no qual o instrumento apresenta melhores estimativas para níveis de satisfação. O trabalho apresentou o nível médio de satisfação da amostra e a concentração de clientes nos diferentes níveis de satisfação da escala.This paper presents a model for assessing the satisfaction of bank customers using the Item Response Theory (IRT. Organizations are constantly making effort to satisfy customers seeking to remain competitive. Several studies have reported on the relationship between perceived quality, satisfaction, and loyalty. The assessment of satisfaction can be accomplished through the perceived quality, and the development of assessment tools should address specific features of the activity in question. Based on articles that assess the satisfaction of bank customers, this study proposes an assessment tool consisting of 29 items. The items were applied to 240 clients to assess their

  3. Using Procedure Based on Item Response Theory to Evaluate Classification Consistency Indices in the Practice of Large-Scale Assessment

    Directory of Open Access Journals (Sweden)

    Shanshan Zhang

    2017-09-01

    Full Text Available In spite of the growing interest in the methods of evaluating the classification consistency (CC indices, only few researches are available in the field of applying these methods in the practice of large-scale educational assessment. In addition, only few studies considered the influence of practical factors, for example, the examinee ability distribution, the cut score location and the score scale, on the performance of CC indices. Using the newly developed Lee's procedure based on the item response theory (IRT, the main purpose of this study is to investigate the performance of CC indices when practical factors are taken into consideration. A simulation study and an empirical study were conducted under comprehensive conditions. Results suggested that with negatively skewed distribution, the CC indices were larger than with other distributions. Interactions occurred among ability distribution, cut score location, and score scale. Consequently, Lee's IRT procedure is reliable to be used in the field of large-scale educational assessment, and when reporting the indices, it should be treated with caution as testing conditions may vary a lot.

  4. Working with Missing Data: Imputation of Nonresponse Items in Categorical Survey Data with a Non-Monotone Missing Pattern

    Directory of Open Access Journals (Sweden)

    Machelle D. Wilson

    2014-01-01

    Full Text Available The imputation of missing data is often a crucial step in the analysis of survey data. This study reviews typical problems with missing data and discusses a method for the imputation of missing survey data with a large number of categorical variables which do not have a monotone missing pattern. We develop a method for constructing a monotone missing pattern that allows for imputation of categorical data in data sets with a large number of variables using a model-based MCMC approach. We report the results of imputing the missing data from a case study, using educational, sociopsychological, and socioeconomic data from the National Latino and Asian American Study (NLAAS. We report the results of multiply imputed data on a substantive logistic regression analysis predicting socioeconomic success from several educational, sociopsychological, and familial variables. We compare the results of conducting inference using a single imputed data set to those using a combined test over several imputations. Findings indicate that, for all variables in the model, all of the single tests were consistent with the combined test.

  5. Development of the Perinatal Depression Inventory (PDI)-14 using item response theory: a comparison of the BDI-II, EPDS, PDI, and PHQ-9.

    Science.gov (United States)

    Brodey, Benjamin B; Goodman, Sherryl H; Baldasaro, Ruth E; Brooks-DeWeese, Amy; Wilson, Melanie Elliott; Brodey, Inger S B; Doyle, Nora M

    2016-04-01

    The objective of this study is to develop a simple, brief, self-report perinatal depression inventory that accurately measures severity in a number of populations. Our team developed 159 Likert-scale perinatal depression items using simple sentences with a fifth-grade reading level. Based on iterative cognitive interviewing (CI), an expert panel improved and winnowed the item pool based on pre-determined criteria. The resulting 67 items were administered to a sample of 628 pregnant and 251 postpartum women with different levels of depression at private and public sector obstetrics clinics, together with the Beck Depression Inventory (BDI-II), Edinburg Postpartum Depression Scale (EPDS), and the Patient Health Questionnaire (PHQ-9), as well as Module A of the Structured Clinical Interview for DSM-IV Diagnoses (SCID). Responses were evaluated using Item Response Theory (IRT). The Perinatal Depression Inventory (PDI)-14 items are highly informative regarding depression severity and function similarly and informatively across pregnant/postpartum, white/non-white, and private-clinic/public-clinic populations. PDI-14 scores correlate well with the PHQ-9, EPDS, and BDI-II, but the PDI-14 provides a more precise measure of severity using far fewer words. The PDI-14 is a brief depression assessment that excels at accurately measuring depression severity across a wide range of severity and perinatal populations.

  6. Examining the Impact of Unscorable Item Responses on the Validity and Interpretability of MMPI-2/MMPI-2-RF Restructured Clinical (RC) Scale Scores

    Science.gov (United States)

    Dragon, Wendy R.; Ben-Porath, Yossef S.; Handel, Richard W.

    2012-01-01

    This article examined the impact of unscorable item responses on the psychometric validity and practical interpretability of scores on the Restructured Clinical (RC) Scales of the Minnesota Multiphasic Personality Inventory-2/Minnesota Multiphasic Personality Inventory-2-Restructured Form (MMPI-2/MMPI-2-RF). In analyses conducted with five…

  7. A General Program for Item-Response Analysis That Employs the Stabilized Newton-Raphson Algorithm. Research Report. ETS RR-13-32

    Science.gov (United States)

    Haberman, Shelby J.

    2013-01-01

    A general program for item-response analysis is described that uses the stabilized Newton-Raphson algorithm. This program is written to be compliant with Fortran 2003 standards and is sufficiently general to handle independent variables, multidimensional ability parameters, and matrix sampling. The ability variables may be either polytomous or…

  8. Teacher Collective Bargaining in Washington State: Assessing the Internal Validity of Partial Independence Item Response Measures of Contract Restrictiveness. CEDR Working Paper No. 2012 3.0

    Science.gov (United States)

    Goldhaber, Dan; Lavery, Lesley; Theobald, Roddy; D'Entremont, Dylan; Fang, Yangru

    2012-01-01

    Recent research (Strunk and Reardon forthcoming) applies Partial Independence Item Response (PIIR) models to teacher bargaining agreements in California to calculate the latent restrictiveness of these contracts. Further research (Strunk and Grissom 2010; Strunk forthcoming) tests the external validity of these estimates. Given that much research…

  9. An item response theory analysis of Harter's Self-Perception Profile for Children or why strong clinical scales should be distrusted

    NARCIS (Netherlands)

    Egberink, Iris J. L.; Meijer, Rob R.

    2011-01-01

    The authors investigated the psychometric properties of the subscales of the Self-Perception Profile for Children with item response theory (IRT) models using a sample of 611 children. Results from a nonparametric Mokken analysis and a parametric IRT approach for boys (n = 268) and girls (n = 343) w

  10. Innovative application of a multidimensional item response model in assessing the influence of social desirability on the pseudo-relationship between self-efficacy and behavior

    Science.gov (United States)

    This study examined multidimensional item response theory (MIRT) modeling to assess social desirability (SocD) influences on self-reported physical activity self-efficacy (PASE) and fruit and vegetable self-efficacy (FVSE). The observed sample included 473 Houston-area adolescent males (10–14 years)...

  11. An Item Response Theory-Based, Computerized Adaptive Testing Version of the MacArthur-Bates Communicative Development Inventory: Words & Sentences (CDI:WS)

    Science.gov (United States)

    Makransky, Guido; Dale, Philip S.; Havmose, Philip; Bleses, Dorthe

    2016-01-01

    Purpose: This study investigated the feasibility and potential validity of an item response theory (IRT)-based computerized adaptive testing (CAT) version of the MacArthur-Bates Communicative Development Inventory: Words & Sentences (CDI:WS; Fenson et al., 2007) vocabulary checklist, with the objective of reducing length while maintaining…

  12. The Effects of Rater Severity and Rater Distribution on Examinees' Ability Estimation for Constructed-Response Items. Research Report. ETS RR-13-23

    Science.gov (United States)

    Wang, Zhen; Yao, Lihua

    2013-01-01

    The current study used simulated data to investigate the properties of a newly proposed method (Yao's rater model) for modeling rater severity and its distribution under different conditions. Our study examined the effects of rater severity, distributions of rater severity, the difference between item response theory (IRT) models with rater effect…

  13. Investigating the Population Sensitivity Assumption of Item Response Theory True-Score Equating across Two Subgroups of Examinees and Two Test Formats

    Science.gov (United States)

    von Davier, Alina A.; Wilson, Christine

    2008-01-01

    Dorans and Holland (2000) and von Davier, Holland, and Thayer (2003) introduced measures of the degree to which an observed-score equating function is sensitive to the population on which it is computed. This article extends the findings of Dorans and Holland and of von Davier et al. to item response theory (IRT) true-score equating methods that…

  14. An item response theory analysis of Harter’s self-perception profile for children or why strong clinical scales should be distrusted

    NARCIS (Netherlands)

    Egberink, I.J.L.; Meijer, R.R.

    2011-01-01

    The authors investigated the psychometric properties of the subscales of the Self-Perception Profile for Children with item response theory (IRT) models using a sample of 611 children. Results from a nonparametric Mokken analysis and a parametric IRT approach for boys (n = 268) and girls (n = 343)

  15. An item response theory analysis of Harter's Self-Perception Profile for Children or why strong clinical scales should be distrusted

    NARCIS (Netherlands)

    Egberink, Iris J. L.; Meijer, Rob R.

    The authors investigated the psychometric properties of the subscales of the Self-Perception Profile for Children with item response theory (IRT) models using a sample of 611 children. Results from a nonparametric Mokken analysis and a parametric IRT approach for boys (n = 268) and girls (n = 343)

  16. Investigating the Population Sensitivity Assumption of Item Response Theory True-Score Equating across Two Subgroups of Examinees and Two Test Formats

    Science.gov (United States)

    von Davier, Alina A.; Wilson, Christine

    2008-01-01

    Dorans and Holland (2000) and von Davier, Holland, and Thayer (2003) introduced measures of the degree to which an observed-score equating function is sensitive to the population on which it is computed. This article extends the findings of Dorans and Holland and of von Davier et al. to item response theory (IRT) true-score equating methods that…

  17. Item Response Theory. Research Report. ETS RR-13-28. ETS R&D Scientific and Policy Contributions Series. ETS SPC-13-05

    Science.gov (United States)

    Carlson, James E.; von Davier, Matthias

    2013-01-01

    Few would doubt that ETS researchers have contributed more to the general topic of item response theory (IRT) than individuals from any other institution. In this report, we briefly review most of those contributions, dividing them into sections by decades of publication, beginning with early work by Fred Lord and Bert Green in the 1950s and…

  18. Improving the Reliability of Student Scores from Speeded Assessments: An Illustration of Conditional Item Response Theory Using a Computer-Administered Measure of Vocabulary

    Science.gov (United States)

    Petscher, Yaacov; Mitchell, Alison M.; Foorman, Barbara R.

    2015-01-01

    A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is…

  19. Corporate Social Responsibility in Engineering Education. A French Survey

    Science.gov (United States)

    Didier, C.; Huet, R.

    2008-01-01

    In this paper, we present and discuss the results of a survey of how corporate social responsibility (CSR) is being discussed and taught in engineering education in France. We shall first describe how those questions have been recently tackled in various programmes of higher education in France. We shall also analyse what faculty members have to…

  20. Response to ERIS 2014 States' Research Needs Survey

    Science.gov (United States)

    This document is ORD’s response to the states’ needs and priorities, as identified in the 2014 survey. ORD identified existing methods, models, tools and databases on these topics, as well as near-term research and development efforts, that could assist states in thei...

  1. The Department Head: A Survey of Duties and Responsibilities.

    Science.gov (United States)

    Papalia, Anthony

    This study surveys 107 foreign language departments in secondary schools in western New York and identifies duties and practices of those responsible for the departmental leadership. The report also determines the amount of released time granted to perform departmental duties. The educational preparation and work experience of supervisory staff…

  2. Corporate Social Responsibility in Engineering Education. A French Survey

    Science.gov (United States)

    Didier, C.; Huet, R.

    2008-01-01

    In this paper, we present and discuss the results of a survey of how corporate social responsibility (CSR) is being discussed and taught in engineering education in France. We shall first describe how those questions have been recently tackled in various programmes of higher education in France. We shall also analyse what faculty members have to…

  3. Multiple-choice versus open-ended response formats of reading test items: A two-dimensional IRT analysis

    Directory of Open Access Journals (Sweden)

    Dominique P. Rauch

    2010-12-01

    Full Text Available The dimensionality of a reading comprehension assessment with non-stem equivalent multiple-choice (MC items and open-ended (OE items was analyzed with German test data of 8523 9th-graders. We found that a two-dimensional IRT model with within-item multidimensionality, where MC and OE items load on a general latent dimension and OE items additionally load on a nested latent dimension, had a superior fit compared to an unidimensional model (p ≤ .05. Correlations between general cognitive abilities, orthography and vocabulary and the general latent dimension were significantly higher than with the nested latent dimension (p ≤ .05. Drawing back on experimental studies on the effect of item format on reading processes, we suppose that the general latent dimension measures abilities necessary to master basic reading processes and the nested latent dimension captures abilities necessary to master higher reading processes. Including gender, language spoken at home, and school track as predictors in latent regression models showed that the well known advantage of girls and mother-tongue students is found only for the nested latent dimension.

  4. Harmonization of Neuroticism and Extraversion phenotypes across inventories and cohorts in the Genetics of Personality Consortium: an application of Item Response Theory

    DEFF Research Database (Denmark)

    van den Berg, S. M.; de Moor, M. H. M.; McGue, Matt

    2014-01-01

    -analyses can be employed. Within the Genetics of Personality Consortium, we demonstrate for two clinically relevant personality traits, Neuroticism and Extraversion, how Item-Response Theory (IRT) can be applied to map item data from different inventories to the same underlying constructs. Personality item...... data were analyzed in > 160,000 individuals from 23 cohorts across Europe, USA and Australia in which Neuroticism and Extraversion were assessed by nine different personality inventories. Results showed that harmonization was very successful for most personality inventories and moderately successful...... for some. Neuroticism and Extraversion inventories were largely measurement invariant across cohorts, in particular when comparing cohorts from countries where the same language is spoken. The IRT-based scores for Neuroticism and Extraversion were heritable (48 and 49 %, respectively, based on a meta...

  5. Bounds on Quantiles in the Presence of Full and Partial Item Nonresponse

    NARCIS (Netherlands)

    Vazquez-Alvarez, R.; Melenberg, B.; van Soest, A.H.O.

    1999-01-01

    Microeconomic surveys are usually subject to the problem of item nonresponse, typically associated with variables like income and wealth, where confidentiality and/or lack of accurate information can affect the response behavior of the individual. Follow up categorical questions can reduce item nonr

  6. "Are vocabulary tests measurement invariant between age groups? An item response analysis of three popular tests": Correction to Fox, Berry, and Freeman (2014).

    Science.gov (United States)

    2016-08-01

    Reports an error in "Are vocabulary tests measurement invariant between age groups? An item response analysis of three popular tests" by Mark C. Fox, Jane M. Berry and Sara P. Freeman (Psychology and Aging, 2014[Dec], Vol 29[4], 925-938). In the article, unneeded zeros were inadvertently included at the beginnings of some numbers in Tables 1–4. In addition, the right column in Table 4 includes three unnecessary zeros after asterisks. (The following abstract of the original article appeared in record 2014-49140-001.) Relatively high vocabulary scores of older adults are generally interpreted as evidence that older adults possess more of a common ability than younger adults. Yet, this interpretation rests on empirical assumptions about the uniformity of item-response functions between groups. In this article, we test item response models of differential responding against datasets containing younger-, middle-aged-, and older-adult responses to three popular vocabulary tests (the Shipley, Ekstrom, and WAIS–R) to determine whether members of different age groups who achieve the same scores have the same probability of responding in the same categories (e.g., correct vs. incorrect) under the same conditions. Contrary to the null hypothesis of measurement invariance, datasets for all three tests exhibit substantial differential responding. Members of different age groups who achieve the same overall scores exhibit differing response probabilities in relation to the same items (differential item functioning) and appear to approach the tests in qualitatively different ways that generalize across items. Specifically, younger adults are more likely than older adults to leave items unanswered for partial credit on the Ekstrom, and to produce 2-point definitions on the WAIS–R. Yet, older adults score higher than younger adults, consistent with most reports of vocabulary outcomes in the cognitive aging literature. In light of these findings, the most generalizable

  7. 非参数项目反应理论回顾与展望%The Retrospect and Prospect of Non-parametric Item Response Theory

    Institute of Scientific and Technical Information of China (English)

    陈婧; 康春花; 钟晓玲

    2013-01-01

      相比参数项目反应理论,非参数项目反应理论提供了更吻合实践情境的理论框架。目前非参数项目反应理论研究主要关注参数估计方法及其比较、数据-模型拟合验证等方面,其应用研究则集中于量表修订及个性数据和项目功能差异分析,而在认知诊断理论基础上发展起来的非参数认知诊断理论更是凸显其应用优势。未来研究应更多侧重于非参数项目反应理论的实践应用,对非参数认知诊断理论的研究也值得关注,以充分发挥非参数方法在实践领域的应用优势。%  Compared to parametric item response theory, non-parametric item response theory provide a more appropriate theoretical framework of practice situations. Non-parametric item response theory research focuses on parameter estimation methods and its comparison, data- model fitting verify etc. currently.Its applied research concentrate on scale amendments, personalized data and differential item functioning analysis. Non-parametric cognitive diagnostic theory which based on the parametric cognitive diagnostic theory gives prominence to the advantages of its application.To give full play to the advantages of non-parametric methods in practice,future studies should emphasis on the application of non-parametric item response theory while cognitive diagnosis of the non-parametric study is also worth of attention.

  8. Do postage stamps versus pre-paid envelopes increase responses to patient mail surveys? A randomised controlled trial

    Directory of Open Access Journals (Sweden)

    Campbell Malcolm

    2008-05-01

    Full Text Available Abstract Background Studies largely from the market research field suggest that the inclusion of a stamped addressed envelope, rather than a pre-paid business reply, increases the response rate to mail surveys. The evidence that this is also the case regarding patient mail surveys is limited. Methods The aim of this study is to investigate whether stamped addressed envelopes increase response rates to patient mail surveys compared to pre-paid business reply envelopes and compare the relative costs. A sample of 477 initial non-responders to a mail survey of patients attending breast clinics in Greater Manchester between 1/10/2002 – 31/7/2003 were entered into the trial: 239 were randomly allocated to receive a stamped envelope and 238 to receive a pre-paid envelope in with their reminder surveys. Overall cost and per item returned were calculated. Results The response to the stamped envelope group was 31.8% (95% CI: 25.9% – 37.7% compared to 26.9% (21.3% – 32.5% for the pre-paid group. The difference (4.9% 95% CI: -3.3% – 13.1% is not significant at α = 0.05 (χ2 = 1.39; 2 tailed test, d.f. = 1; P = 0.239. The stamped envelopes were cheaper in terms of cost per returned item (£1.20 than the pre-paid envelopes (£1.67. However if the set up cost for the licence to use the pre-paid service is excluded, the cost of the stamped envelopes is more expensive than pre-paid returns (£1.20 versus £0.73. Conclusion Compared with pre-paid business replies, stamped envelopes did not produce a statistically significant increase in response rate to this patient survey. However, the response gain of the stamped strategy (4.9% is similar to that demonstrated in a Cochrane review (5.3% of strategies to increase response to general mail surveys. Further studies and meta analyses of patient responses to mail surveys via stamped versus pre-paid envelopes are needed with sufficient power to detect response gains of this magnitude in a patient population.

  9. [Perceptions on item disclosure for the Korean medical licensing examination].

    Science.gov (United States)

    Yang, Eunbae B

    2015-09-01

    This study analyzed the perceptions of medical students and faculty regarding disclosure of test items on the Korean medical licensing examination. I conducted a survey of medical students from medical colleges and professional medical schools nationwide. Responses were analyzed from 718 participants as well as 69 faculty members who participated in creating the medical licensing examination item sets. Data were analyzed using descriptive statistics and the chi-square test. It is important to maintain test quality and to keep the test items unavailable to the public. There are also concerns among students that disclosure of test items would prompt increasing difficulty of test items (48.3%). Further, few students found it desirable to disclose test items regardless of any considerations (28.5%). The professors, who had experience in designing the test items, also expressed their opposition to test item disclosure (60.9%). It is desirable not to disclose the test items of the Korean medical licensing examination to the public on the condition that students are provided with a sufficient amount of information regarding the examination. This is so that the exam can appropriately identify candidates with the required qualifications.

  10. Non-Response in Student Surveys: The Role of Demographics, Engagement and Personality

    Science.gov (United States)

    Porter, Stephen R.; Whitcomb, Michael E.

    2005-01-01

    What causes a student to participate in a survey? This paper looks at participation across multiple surveys to understand survey non-response; by using multiple surveys we minimize the impact of survey salience. Students at a selective liberal arts college were administered four different surveys throughout the 2002-2003 academic year, and we use…

  11. Cultural Resources Intensive Survey and Testing of Mississippi River Levee Berms, Crittenden and Desha Counties, Arkansas and Mississippi, Scott, Cape Girardeau and Pemiscot Counties, Missouri Item R-618 Knowlton; Desha County, Arkansas.

    Science.gov (United States)

    1983-11-01

    distribucion of cultural resources within the project area . In addition, information obtained in the background and literature search should be of such scope...DAC0W66-83-C-0030, Item R-618, to conduct a background, archi- val and literature search, and an intensive resources survey of teroject area of proposed...seepage through the levee during periods of flooding. The area surveyed included: 152.4 meters (500 feet) right-of-way perpen- dicular and landside frow

  12. Developing a Numerical Ability Test for Students of Education in Jordan: An Application of Item Response Theory

    Science.gov (United States)

    Abed, Eman Rasmi; Al-Absi, Mohammad Mustafa; Abu shindi, Yousef Abdelqader

    2016-01-01

    The purpose of the present study is developing a test to measure the numerical ability for students of education. The sample of the study consisted of (504) students from 8 universities in Jordan. The final draft of the test contains 45 items distributed among 5 dimensions. The results revealed that acceptable psychometric properties of the test;…

  13. Item response theory was used to shorten EORTC QLQ-C30 scales for use in palliative care

    NARCIS (Netherlands)

    M.A. Petersen; M. Groenvold; N. Aaronson; J. Blazeby; Y. Brandberg; A. de Graeff; P. Fayers; E. Hammerlid; M. Sprangers; G. Velikova; J.B. Bjorner

    2006-01-01

    Background and Objective: The goal was to develop a shortened version of the EORTC QLQ-C30 for use in palliative care. We wanted to keep as few items as possible in each scale while still being able to compare results with studies using the original scales. We examined the possibilities of shortenin

  14. The D-Optimality Item Selection Criterion in the Early Stage of CAT: A Study with the Graded Response Model

    Science.gov (United States)

    Passos, Valeria Lima; Berger, Martijn P. F.; Tan, Frans E. S.

    2008-01-01

    During the early stage of computerized adaptive testing (CAT), item selection criteria based on Fisher"s information often produce less stable latent trait estimates than the Kullback-Leibler global information criterion. Robustness against early stage instability has been reported for the D-optimality criterion in a polytomous CAT with the…

  15. The Development of Automaticity in Short-Term Memory Search: Item-Response Learning and Category Learning

    Science.gov (United States)

    Cao, Rui; Nosofsky, Robert M.; Shiffrin, Richard M.

    2017-01-01

    In short-term-memory (STM)-search tasks, observers judge whether a test probe was present in a short list of study items. Here we investigated the long-term learning mechanisms that lead to the highly efficient STM-search performance observed under conditions of consistent-mapping (CM) training, in which targets and foils never switch roles across…

  16. Internal Medicine Residents' Perceived Responsibility for Patients at Hospital Discharge: A National Survey.

    Science.gov (United States)

    Young, Eric; Stickrath, Chad; McNulty, Monica C; Calderon, Aaron J; Chapman, Elizabeth; Gonzalo, Jed D; Kuperman, Ethan F; Lopez, Max; Smith, Christopher J; Sweigart, Joseph R; Theobald, Cecelia N; Burke, Robert E

    2016-12-01

    Medical residents are routinely entrusted with transitions of care, yet little is known about the duration or content of their perceived responsibility for patients they discharge from the hospital. To examine the duration and content of internal medicine residents' perceived responsibility for patients they discharge from the hospital. The secondary objective was to determine whether specific individual experiences and characteristics correlate with perceived responsibility. Multi-site, cross-sectional 24-question survey delivered via email or paper-based form. Internal medicine residents (post-graduate years 1-3) at nine university and community-based internal medicine training programs in the United States. Perceived responsibility for patients after discharge as measured by a previously developed single-item tool for duration of responsibility and novel domain-specific questions assessing attitudes towards specific transition of care behaviors. Of 817 residents surveyed, 469 responded (57.4 %). One quarter of residents (26.1 %) indicated that their responsibility for patients ended at discharge, while 19.3 % reported perceived responsibility extending beyond 2 weeks. Perceived duration of responsibility did not correlate with level of training (P = 0.57), program type (P = 0.28), career path (P = 0.12), or presence of burnout (P = 0.59). The majority of residents indicated they were responsible for six of eight transitional care tasks (85.1-99.3 % strongly agree or agree). Approximately half of residents (57 %) indicated that it was their responsibility to directly contact patients' primary care providers at discharge. and 21.6 % indicated that it was their responsibility to ensure that patients attended their follow-up appointments. Internal medicine residents demonstrate variability in perceived duration of responsibility for recently discharged patients. Neither the duration nor the content of residents' perceived responsibility was

  17. Evaluation of mode equivalence of the MSKCC Bowel Function Instrument, LASA Quality of Life, and Subjective Significance Questionnaire items administered by Web, interactive voice response system (IVRS), and paper.

    Science.gov (United States)

    Bennett, Antonia V; Keenoy, Kathleen; Shouery, Marwan; Basch, Ethan; Temple, Larissa K

    2016-05-01

    To assess the equivalence of patient-reported outcome (PRO) survey responses across Web, interactive voice response system (IVRS), and paper modes of administration. Postoperative colorectal cancer patients with home Web/e-mail and phone were randomly assigned to one of the eight study groups: Groups 1-6 completed the survey via Web, IVRS, and paper, in one of the six possible orders; Groups 7-8 completed the survey twice, either by Web or by IVRS. The 20-item survey, including the MSKCC Bowel Function Instrument (BFI), the LASA Quality of Life (QOL) scale, and the Subjective Significance Questionnaire (SSQ) adapted to bowel function, was completed from home on consecutive days. Mode equivalence was assessed by comparison of mean scores across modes and intraclass correlation coefficients (ICCs) and was compared to the test-retest reliability of Web and IVRS. Of 170 patients, 157 completed at least one survey and were included in analysis. Patients had mean age 56 (SD = 11), 53% were male, 81% white, 53% colon, and 47% rectal cancer; 78% completed all assigned surveys. Mean scores for BFI total score, BFI subscale scores, LASA QOL, and adapted SSQ varied by mode by less than one-third of a score point. ICCs across mode were: BFI total score (Web-paper = 0.96, Web-IVRS = 0.97, paper-IVRS = 0.97); BFI subscales (range = 0.88-0.98); LASA QOL (Web-paper = 0.98, Web-IVRS = 0.78, paper-IVRS = 0.80); and SSQ (Web-paper = 0.92, Web-IVRS = 0.86, paper-IVRS = 0.79). Mode equivalence was demonstrated for the BFI total score, BFI subscales, LASA QOL, and adapted SSQ, supporting the use of multiple modes of PRO data capture in clinical trials.

  18. Designing questionnaires: healthcare survey to compare two different response scales

    Science.gov (United States)

    2014-01-01

    Background A widely discussed design issue in patient satisfaction questionnaires is the optimal length and labelling of the answering scale. The aim of the present study was to compare intra-individually the answers on two response scales to five general questions evaluating patients’ perception of hospital care. Methods Between November 2011 and January 2012, all in-hospital patients at a Swiss University Hospital received a patient satisfaction questionnaire on an adjectival scale with three to four labelled categories (LS) and five redundant questions displayed on an 11-point end-anchored numeric scale (NS). The scales were compared concerning ceiling effect, internal consistency (Cronbach’s alpha), individual item answers (Spearman’s rank correlation), and concerning overall satisfaction by calculating an overall percentage score (sum of all answers related to the maximum possible sum). Results The response rate was 41% (2957/7158), of which 2400 (81%) completely filled out all questions. Baseline characteristics of the responders and non-responders were similar. Floor and ceiling effect were high on both response scales, but more pronounced on the LS than on the NS. Cronbach’s alpha was higher on the NS than on the LS. There was a strong individual item correlation between both answering scales in questions regarding the intent to return, quality of treatment and the judgement whether the patient was treated with respect and dignity, but a lower correlation concerning satisfactory information transfer by physicians or nurses, where only three categories were available in the LS. The overall percentage score showed a comparable distribution, but with a wider spread of lower satisfaction in the NS. Conclusions Since the longer scale did not substantially reduce the ceiling effect, the type of questions rather than the type of answering scale could be addressed with a focus on specific questions about concrete situations instead of general questions

  19. Development and Validation of the 34-Item Disability Screening Questionnaire (DSQ-34 for Use in Low and Middle Income Countries Epidemiological and Development Surveys.

    Directory of Open Access Journals (Sweden)

    Jean-François Trani

    Full Text Available Although 80% of persons with disabilities live in low and middle-income countries, there is still a lack of comprehensive, cross-culturally validated tools to identify persons facing activity limitations and functioning difficulties in these settings. In absence of such a tool, disability estimates vary considerably according to the methodology used, and policies are based on unreliable estimates.The Disability Screening Questionnaire composed of 27 items (DSQ-27 was initially designed by a group of international experts in survey development and disability in Afghanistan for a national survey. Items were selected based on major domains of activity limitations and functioning difficulties linked to an impairment as defined by the International Classification of Functioning, Disability and Health. Face, content and construct validity, as well as sensitivity and specificity were examined. Based on the results obtained, the tool was subsequently refined and expanded to 34 items, tested and validated in Darfur, Sudan. Internal consistency for the total DSQ-34 using a raw and standardized Cronbach's Alpha and within each domain using a standardized Cronbach's Alpha was examined in the Asian context (India and Nepal. Exploratory factor analysis (EFA using principal axis factoring (PAF evaluated the lowest number of factors to account for the common variance among the questions in the screen. Test-retest reliability was determined by calculating intraclass correlation (ICC and inter-rater reliability by calculating the kappa statistic; results were checked using Bland-Altman plots. The DSQ-34 was further tested for standard error of measurement (SEM and for the minimum detectable change (MDC. Good internal consistency was indicated by Cronbach's Alpha of 0.83/0.82 for India and 0.76/0.78 for Nepal. We confirmed our assumption for EFA using the Kaiser-Meyer-Olkin measure of sampling well above the accepted cutoff of 0.40 for India (0.82 and Nepal (0

  20. Development and Validation of the 34-Item Disability Screening Questionnaire (DSQ-34) for Use in Low and Middle Income Countries Epidemiological and Development Surveys

    Science.gov (United States)

    Trani, Jean-François; Babulal, Ganesh Muneshwar; Bakhshi, Parul

    2015-01-01

    Background Although 80% of persons with disabilities live in low and middle-income countries, there is still a lack of comprehensive, cross-culturally validated tools to identify persons facing activity limitations and functioning difficulties in these settings. In absence of such a tool, disability estimates vary considerably according to the methodology used, and policies are based on unreliable estimates. Methods and Findings The Disability Screening Questionnaire composed of 27 items (DSQ-27) was initially designed by a group of international experts in survey development and disability in Afghanistan for a national survey. Items were selected based on major domains of activity limitations and functioning difficulties linked to an impairment as defined by the International Classification of Functioning, Disability and Health. Face, content and construct validity, as well as sensitivity and specificity were examined. Based on the results obtained, the tool was subsequently refined and expanded to 34 items, tested and validated in Darfur, Sudan. Internal consistency for the total DSQ-34 using a raw and standardized Cronbach’s Alpha and within each domain using a standardized Cronbach’s Alpha was examined in the Asian context (India and Nepal). Exploratory factor analysis (EFA) using principal axis factoring (PAF) evaluated the lowest number of factors to account for the common variance among the questions in the screen. Test-retest reliability was determined by calculating intraclass correlation (ICC) and inter-rater reliability by calculating the kappa statistic; results were checked using Bland-Altman plots. The DSQ-34 was further tested for standard error of measurement (SEM) and for the minimum detectable change (MDC). Good internal consistency was indicated by Cronbach’s Alpha of 0.83/0.82 for India and 0.76/0.78 for Nepal. We confirmed our assumption for EFA using the Kaiser-Meyer-Olkin measure of sampling well above the accepted cutoff of 0.40 for

  1. Utilização da Teoria da Resposta ao Item (TRI para a organização de um banco de itens destinados a avaliação do raciocínio verbal Using the Item Response Theory (IRT in the construction of an item bank for the evaluation of verbal reasoning

    Directory of Open Access Journals (Sweden)

    Wagner Bandeira Andriola

    1998-01-01

    Full Text Available Esta pesquisa objetivou a organização de um banco de itens destinados a avaliação do raciocínio verbal, utilizando a Teoria de Respostas ao Item (TRI. Com as respostas de 730 alunos do 2º grau, cuja idade média foi de 17,7 anos (DP = 3,12 fornecidas a um grupo de 51 itens em formato de analogias verbais, estimou-se a dificuldade e a discriminação através do modelo longístico de dois parâmetros. Também foram determinadas as curvas características dos itens (CCIs.The purpose of this research was to organize an item bank for the evaluation of verbal reasoning using the Item Response Theory (IRT. With the responses of 730 high school students, average age 17,7 (SD = 3,12, to a group of 51 itens in the form of verbal analogies, the difficulty and discrimination were estimated using the longistic model of two parameters. The itens characteristic curves (ICC’s were also determined.

  2. Unidimensional Interpretations for Multidimensional Test Items

    Science.gov (United States)

    Kahraman, Nilufer

    2013-01-01

    This article considers potential problems that can arise in estimating a unidimensional item response theory (IRT) model when some test items are multidimensional (i.e., show a complex factorial structure). More specifically, this study examines (1) the consequences of model misfit on IRT item parameter estimates due to unintended minor item-level…

  3. Unidimensional Interpretations for Multidimensional Test Items

    Science.gov (United States)

    Kahraman, Nilufer

    2013-01-01

    This article considers potential problems that can arise in estimating a unidimensional item response theory (IRT) model when some test items are multidimensional (i.e., show a complex factorial structure). More specifically, this study examines (1) the consequences of model misfit on IRT item parameter estimates due to unintended minor item-level…

  4. Toward a More Responsive Consumable Materiel Supply Chain: Leveraging New Metrics to Identify and Classify Items of Concern

    Science.gov (United States)

    2016-06-01

    BLANK xiii LIST OF ACRONYMS AND ABBREVIATIONS AAC acquisition advice code ACWT average customer wait time ALT administrative lead time CART ...the item, and time between orders. Named “Peak” and “Next Gen ,” these two optimization calculations are held by a contracting vendor as proprietary...for further scrutiny. Classification and regression trees ( CART ), implemented in the R software environment via the RPART package (Therneau, Atkinson

  5. Teoria de Resposta ao Item na análise de uma prova de estatística em universitários Item Response Theory to analyze a statistics test in university students

    Directory of Open Access Journals (Sweden)

    Claudette Maria Medeiros Vendramini

    2005-12-01

    Full Text Available Este estudo objetivou aplicar a Teoria de Resposta ao Item na análise das 15 questões de múltipla escolha de uma prova de estatística apresentada na forma de gráficos ou de tabelas estatísticas. Participaram 413 universitários, selecionados por conveniência, de duas instituições da rede particular de ensino superior, predominantemente do curso de Psicologia (91,5%. Os universitários foram 80% do gênero feminino e do período diurno (69,8%, com idades de 16 a 53 anos, média 24,4 e desvio padrão 7,4. A prova é predominantemente unidimensional e os itens são mais bem ajustados ao modelo logístico de três parâmetros. Os índices de discriminação, dificuldade e correlação bisserial apresentam valores aceitáveis. Os resultados mostram as dificuldades apresentadas pelos estudantes com relação aos conceitos matemáticos e estatísticos, dificuldades essas observadas em outras pesquisas desde o ensino fundamental. Sugere-se que esses conceitos sejam tratados mais profundamente no ensino superior.This study aimed to use the Item Response Theory to analyze the 15 multiple-choice questions of a statistics test presented in the statistics graphics or tables form. The 414 university students were selected by convenience from two private universities, predominantly psychology students (91.5%. The university students were 80% female and with 16-53 years old, mean 24.4 and standard deviation 7.4. The test has predominantly one dimension and the items can be better fitting to the model of three parameters. The indexes of difficulty, discrimination and bisserial correlation presented acceptable values. The results indicate the difficulties of university students in the mathematic and statistic concepts, that difficulties are observed in the other studies since the elementary education. One suggests making more profound studies of these concepts in higher education.

  6. Prenotification, Incentives, and Survey Modality: An Experimental Test of Methods to Increase Survey Response Rates of School Principals

    Science.gov (United States)

    Jacob, Robin Tepper; Jacob, Brian

    2012-01-01

    Teacher and principal surveys are among the most common data collection techniques employed in education research. Yet there is remarkably little research on survey methods in education, or about the most cost-effective way to raise response rates among teachers and principals. In an effort to explore various methods for increasing survey response…

  7. Prenotification, Incentives, and Survey Modality: An Experimental Test of Methods to Increase Survey Response Rates of School Principals

    Science.gov (United States)

    Jacob, Robin Tepper; Jacob, Brian

    2012-01-01

    Teacher and principal surveys are among the most common data collection techniques employed in education research. Yet there is remarkably little research on survey methods in education, or about the most cost-effective way to raise response rates among teachers and principals. In an effort to explore various methods for increasing survey response…

  8. The Effects of Survey Administration on Disclosure Rates to Sensitive Items Among Men: A Comparison of an Internet Panel Sample with a RDD Telephone Sample.

    Science.gov (United States)

    Hines, Denise A; Douglas, Emily M; Mahmood, Sehar

    2010-11-01

    Research using Internet surveys is an emerging field, yet research on the legitimacy of using Internet studies, particularly those targeting sensitive topics, remains under-investigated. The current study builds on the existing literature by exploring the demographic differences between Internet panel and RDD telephone survey samples, as well as differences in responses with regard to experiences of intimate partner violence perpetration and victimization, alcohol and substance use/abuse, PTSD symptomatology, and social support. Analyses indicated that after controlling for demographic differences, there were few differences between the samples in their disclosure of sensitive information, and that the online sample was more socially isolated than the phone sample. Results are discussed in terms of their implications for using Internet samples in research on sensitive topics.

  9. Factors associated with survey response in hand surgery research.

    Science.gov (United States)

    Bot, Arjan G J; Anderson, Jade A; Neuhaus, Valentin; Ring, David

    2013-10-01

    A low response rate is believed to decrease the validity of survey studies. Factors associated with nonresponse to surveys are poorly characterized in orthopaedic research. This study addressed whether (1) psychologic factors; (2) demographics; (3) illness-related factors; and (4) pain are predictors of a lower likelihood of a patient returning a mailed survey. One hundred four adult, new or return patients completed questionnaires including the Pain Catastrophizing Scale, Patient Health Questionnaire-9 depression scale, Short Health Anxiety Index, demographics, and a pain scale (0-10) during a routine visit to a hand and upper extremity surgeon. Of these patients, 38% had undergone surgery and the remainder was seen for various other conditions. Six months after their visit, patients were mailed the DASH questionnaire and a scale to rate their satisfaction with the visit (0-10). Bivariate analysis and logistic regression were used to determine risk factors for being a nonresponder to the followup of this study. The cohort consisted of 57 women and 47 men with a mean age of 51 years with various diagnoses. Thirty-five patients (34%) returned the questionnaire. Responders were satisfied with their visit (mean satisfaction, 8.7) and had a DASH score of 9.6. Compared with patients who returned the questionnaires, nonresponders had higher pain catastrophizing scores, were younger, more frequently male, and had more pain at enrollment. In logistic regression, male sex (odds ratio [OR], 2.6), pain (OR, 1.3), and younger age (OR, 1.03) were associated with not returning the questionnaire. Survey studies should be interpreted in light of the fact that patients who do not return questionnaires in a hand surgery practice differ from patients who do return them. Hand surgery studies that rely on questionnaire evaluation remote from study enrollment should include tactics to improve the response of younger, male patients with more pain. Level II, prognostic study. See

  10. Comparing the Personality Disorder Interview for DSM-IV (PDI-IV) and SCID-II borderline personality disorder scales: an item-response theory analysis.

    Science.gov (United States)

    Huprich, Steven K; Paggeot, Amy V; Samuel, Douglas B

    2015-01-01

    One-hundred sixty-nine psychiatric outpatients and 171 undergraduate students were assessed with the Personality Disorder Interview-IV (PDI-IV; Widiger, Mangine, Corbitt, Ellis, & Thomas, 1995) and the Structured Clinical Interview for DSM-IV Axis II disorders (SCID-II; First, Gibbon, Spitzer, Williams, & Benjamin, 1997) for borderline personality disorder (BPD). Eighty individuals met PDI-IV BPD criteria, whereas 34 met SCID-II BPD criteria. Dimensional ratings of both measures were highly intercorrelated (rs = .78, .75), and item-level interrater reliability fell in the good to excellent range. An item-response theory analysis was performed to investigate whether properties of the items from each interview could help understand these differences. The limited agreement seemed to be explained by differences in the response options across the two interviews. We found that suicidal behavior was among the most discriminating criteria on both instruments, whereas dissociation and difficulty controlling anger had the 2 lowest alpha parameter values. Finally, those meeting BPD criteria on both interviews had higher levels of anxiety, depression, and more impairments in object relations than those meeting criteria on just the PDI-IV. These findings suggest that the choice of measure has a notable effect on the obtained diagnostic prevalence and the level of BPD severity that is detected.

  11. Natural History of Dependency in the Elderly: A 24-Year Population-Based Study Using a Longitudinal Item Response Theory Model.

    Science.gov (United States)

    Edjolo, Arlette; Proust-Lima, Cécile; Delva, Fleur; Dartigues, Jean-François; Pérès, Karine

    2016-02-15

    We aimed to describe the hierarchical structure of Instrumental Activities of Daily Living (IADL) and basic Activities of Daily Living (ADL) and trajectories of dependency before death in an elderly population using item response theory methodology. Data were obtained from a population-based French cohort study, the Personnes Agées QUID (PAQUID) Study, of persons aged ≥65 years at baseline in 1988 who were recruited from 75 randomly selected areas in Gironde and Dordogne. We evaluated IADL and ADL data collected at home every 2-3 years over a 24-year period (1988-2012) for 3,238 deceased participants (43.9% men). We used a longitudinal item response theory model to investigate the item sequence of 11 IADL and ADL combined into a single scale and functional trajectories adjusted for education, sex, and age at death. The findings confirmed the earliest losses in IADL (shopping, transporting, finances) at the partial limitation level, and then an overlapping of concomitant IADL and ADL, with bathing and dressing being the earliest ADL losses, and finally total losses for toileting, continence, eating, and transferring. Functional trajectories were sex-specific, with a benefit of high education that persisted until death in men but was only transient in women. An in-depth understanding of this sequence provides an early warning of functional decline for better adaptation of medical and social care in the elderly.

  12. 77 FR 20887 - Proposed Information Collection (National Acquisition Center Customer Response Survey) Activity...

    Science.gov (United States)

    2012-04-06

    ... solicits comments on the information needed to measure customer satisfaction with delivered products and... AFFAIRS Proposed Information Collection (National Acquisition Center Customer Response Survey) Activity...: Department of Veterans Affairs (VA) National Acquisition Center Customer Response Survey, VA Form 0863....

  13. Survey of spectral response measurements for photovoltaic devices

    Energy Technology Data Exchange (ETDEWEB)

    Hartman, J.S.; Lind, M.A.

    1981-11-01

    A survey of the photovoltaic community was conducted to ascertain the present state-of-the-art for PV spectral response measurements. Specific topics explored included measurement system designs, good and bad features of the systems, and problems encountered in the evaluation of specific cell structures and materials. The survey showed that most spectral response data are used in diagnostic analysis for the optimization of developmental solar cells. Measurement systems commonly utilize a chopped narrowband source in conjunction with a constant bias illumination which simulates the ambient end use environment. Researchers emphasized the importance of bias illumination for all types of cells in order to minimize the effects of nonlinearities in cell response. Not surprisingly single crystal silicon cells present the fewest measurement problems to the researcher and have been studied more thoroughly than any other type of solar cell. But, the accurate characterization of silicon cells is still difficult and laboratory intercomparison studies have yielded data scatter ranging from +-5% to +-15%. The measurement experience with other types of cells is less extensive. The development of reliable data bases for some solar cells is complicated by problems of cell nonuniformity, environmental instability, nonlinearity, etc. Cascade cells present new problems associated with their structue (multiple cells in series) which are just beginning to be understood. In addition, the importance of many measurement parameters (spectral content of bias light, bias light intensity, bias voltage, chopping frequency, etc.) are not fully understood for most types of solar cells.

  14. Detection of Differential Item Functioning Using the Lasso Approach

    Science.gov (United States)

    Magis, David; Tuerlinckx, Francis; De Boeck, Paul

    2015-01-01

    This article proposes a novel approach to detect differential item functioning (DIF) among dichotomously scored items. Unlike standard DIF methods that perform an item-by-item analysis, we propose the "LR lasso DIF method": logistic regression (LR) model is formulated for all item responses. The model contains item-specific intercepts,…

  15. Validation of Portuguese version of Quality of Erection Questionnaire (QEQ) and comparison to International Index of Erectile Function (IIEF) and RAND 36-Item Health Survey.

    Science.gov (United States)

    Reis, Ana Luiza; Reis, Leonardo Oliveira; Saade, Ricardo Destro; Santos, Carlos Alberto; Lima, Marcelo Lopes de; Fregonesi, Adriano

    2015-01-01

    To validate the Quality of Erection Questionnaire (QEQ) considering Brazilian social-cultural aspects. To determine equivalence between the Portuguese and the English QEQ versions, the Portuguese version was back-translated by two professors who are native English speakers. After language equivalence had been determined, urologists considered the QEQ Portuguese version suitable. Men with self-reported erectile dysfunction (ED) and infertile men who had a stable sexual relationship for at least 6 months were invited to answer the QEQ, the International Index of Erectile Function (IIEF) and the RAND 36-Item Health Survey (RAND-36). The questionnaires were presented together and answered without help in a private room. Internal consistency (Cronbach's α), test-retest reliability (Spearman), convergent validity (Spearman correlation) coefficients and known-groups validity (the ability of the QEQ Portuguese version to differentiate erectile dysfunction severity groups) were assessed. We recruited 197 men (167 ED patients and 30 non-ED patients), mean age of 53.3 and median of 55.5 years (23-82 years). The Portuguese version of the QEQ had high internal consistency (Cronbach α=0.93), high stability between test and retest (ICC 0.83, with IC 95%: 0.76-0.88, pPortuguese version presented good psychometric properties and high convergent validity in relation to IIEF. The low correlations between the QEQ and the RAND-36, as well as between the IIEF and the RAND-36 indicated IIEF and QEQ specificity, which may have resulted from the patients' psychological adaptations that minimized the impact of ED on Quality of Life (QoL) and reestablished the well-being feeling.

  16. Performance of the Family Satisfaction with the End-of-Life Care (FAMCARE) measure in an ethnically diverse cohort: psychometric analyses using item response theory.

    Science.gov (United States)

    Teresi, Jeanne A; Ornstein, Katherine; Ocepek-Welikson, Katja; Ramirez, Mildred; Siu, Albert

    2014-02-01

    The Family Satisfaction with End-of-Life Care (FAMCARE) has been used widely among caregivers to individuals with cancer. The aim of this study was to evaluate the psychometric properties of this measure using item response theory (IRT). The analytic sample was comprised of caregivers to 1,983 patients with advanced cancer. Among the patients, 56 % were females, with mean age 59.9 years (s.d. = 11.8), 20 % were non-Hispanic Black. The majority were family members either living with (44 %) or not living with (35 %) the patient. Factor analyses and IRT were used to examine the dimensionality, information, and reliability of the FAMCARE. Although a bi-factor model fit the data slightly better than did a unidimensional model, the loadings on the group factors were very low. Thus, a unidimensional model appears to provide adequate representation for the item set. The reliability estimates, calculated along the satisfaction (theta) continuum, were adequate (>0.80) for all levels of theta for which subjects had scores. Examination of the category response functions from IRT showed overlap in the lower categories with little unique information provided; moreover, the categories were not observed to be interval. Based on these analyses, a three-response category format was recommended: very satisfied, satisfied, and not satisfied. Most information was provided in the range indicative of either dissatisfaction or high satisfaction. These analyses support the use of fewer response categories and provide item parameters that form a basis for developing shorter-form scales. Such a revision has the potential to reduce respondent burden.

  17. Harmonization of Neuroticism and Extraversion phenotypes across inventories and cohorts in the Genetics of Personality Consortium: an application of Item Response Theory.

    Science.gov (United States)

    van den Berg, Stéphanie M; de Moor, Marleen H M; McGue, Matt; Pettersson, Erik; Terracciano, Antonio; Verweij, Karin J H; Amin, Najaf; Derringer, Jaime; Esko, Tõnu; van Grootheest, Gerard; Hansell, Narelle K; Huffman, Jennifer; Konte, Bettina; Lahti, Jari; Luciano, Michelle; Matteson, Lindsay K; Viktorin, Alexander; Wouda, Jasper; Agrawal, Arpana; Allik, Jüri; Bierut, Laura; Broms, Ulla; Campbell, Harry; Smith, George Davey; Eriksson, Johan G; Ferrucci, Luigi; Franke, Barbera; Fox, Jean-Paul; de Geus, Eco J C; Giegling, Ina; Gow, Alan J; Grucza, Richard; Hartmann, Annette M; Heath, Andrew C; Heikkilä, Kauko; Iacono, William G; Janzing, Joost; Jokela, Markus; Kiemeney, Lambertus; Lehtimäki, Terho; Madden, Pamela A F; Magnusson, Patrik K E; Northstone, Kate; Nutile, Teresa; Ouwens, Klaasjan G; Palotie, Aarno; Pattie, Alison; Pesonen, Anu-Katriina; Polasek, Ozren; Pulkkinen, Lea; Pulkki-Råback, Laura; Raitakari, Olli T; Realo, Anu; Rose, Richard J; Ruggiero, Daniela; Seppälä, Ilkka; Slutske, Wendy S; Smyth, David C; Sorice, Rossella; Starr, John M; Sutin, Angelina R; Tanaka, Toshiko; Verhagen, Josine; Vermeulen, Sita; Vuoksimaa, Eero; Widen, Elisabeth; Willemsen, Gonneke; Wright, Margaret J; Zgaga, Lina; Rujescu, Dan; Metspalu, Andres; Wilson, James F; Ciullo, Marina; Hayward, Caroline; Rudan, Igor; Deary, Ian J; Räikkönen, Katri; Arias Vasquez, Alejandro; Costa, Paul T; Keltikangas-Järvinen, Liisa; van Duijn, Cornelia M; Penninx, Brenda W J H; Krueger, Robert F; Evans, David M; Kaprio, Jaakko; Pedersen, Nancy L; Martin, Nicholas G; Boomsma, Dorret I

    2014-07-01

    Mega- or meta-analytic studies (e.g. genome-wide association studies) are increasingly used in behavior genetics. An issue in such studies is that phenotypes are often measured by different instruments across study cohorts, requiring harmonization of measures so that more powerful fixed effect meta-analyses can be employed. Within the Genetics of Personality Consortium, we demonstrate for two clinically relevant personality traits, Neuroticism and Extraversion, how Item-Response Theory (IRT) can be applied to map item data from different inventories to the same underlying constructs. Personality item data were analyzed in >160,000 individuals from 23 cohorts across Europe, USA and Australia in which Neuroticism and Extraversion were assessed by nine different personality inventories. Results showed that harmonization was very successful for most personality inventories and moderately successful for some. Neuroticism and Extraversion inventories were largely measurement invariant across cohorts, in particular when comparing cohorts from countries where the same language is spoken. The IRT-based scores for Neuroticism and Extraversion were heritable (48 and 49 %, respectively, based on a meta-analysis of six twin cohorts, total N = 29,496 and 29,501 twin pairs, respectively) with a significant part of the heritability due to non-additive genetic factors. For Extraversion, these genetic factors qualitatively differ across sexes. We showed that our IRT method can lead to a large increase in sample size and therefore statistical power. The IRT approach may be applied to any mega- or meta-analytic study in which item-based behavioral measures need to be harmonized.

  18. Clinical Validation of the Nursing Outcome "Swallowing Status" in People with Stroke: Analysis According to the Classical and Item Response Theories.

    Science.gov (United States)

    Oliveira-Kumakura, Ana Railka de Souza; de Araujo, Thelma Leite; Costa, Alice Gabrielle de Sousa; Cavalcante, Tahissa Frota; Lopes, Marcos Venícios de Oliveira; Carvalho, Emilia Campos

    2017-09-19

    To validate clinically the nursing outcome "Swallowing status". The adjustment of the nursing outcome was investigated according to the Classical and Item Response Theories. The models were compared regarding information loss, goodness-of-fit, and differential item functioning. Stability and internal consistency were examined. The nursing outcome has the best fit in the generalized partial credit model with different discrimination parameters. Strong correlations among the scores of each indicator were observed. There was no differential item functioning of the outcome indicators. The scale presented high internal consistency (Cronbach's α = .954) and stability (and > .800). This study presents a valid nursing outcome. Most accurate monitoring of sensitivity to an intervention. Validar clinicamente o resultado de enefermagem "Estado da Deglutição". MÉTODOS: O ajustamento do resultado foi investigado de acordo com as teorias Clássica e de Resposta ao Item. Os modelos foram comparados assumindo parâmetros de itens cruzados de igual discriminação. Investigaram-se as propriedades de bondade do ajuste, funcionamento diferencial dos itens, estabilidade e consistência interna. O resultado se ajustou melhor a partir do Modelo de crédito parcial generalizado, o qual demonstrou unidimensionalidade do resultado e forte correlação entre os escores de cada indicador. Não houve funcionamento diferencial dos indicadores. A consistência interna para a escala global (Cronbach's α = .954) e a estabilidade (>.800) mantiveram-se elevadas. CONCLUSÃO: O estudo apresenta um resultado de enfermagem válido. RELEVÂNCIA PARA A PRÁTICA CLÍNICA: Maior acurácia para monitorar a sensibilidade da intervenção. © 2017 NANDA International, Inc.

  19. Reliability, validity and responsiveness of a Norwegian version of the Chronic Sinusitis Survey

    Directory of Open Access Journals (Sweden)

    Røssberg Edna

    2006-05-01

    Full Text Available Abstract Background The Chronic Sinusitis Survey (CSS is a valid, disease-specific questionnaire for assessing health status and treatment effectiveness in chronic rhinosinusitis. In the present study, we developed a Norwegian version of the CSS and assessed its psychometric properties. Methods In the pooled data set of 65 patients from a trial of treatment for chronic sinusitis with long-standing symptoms and signs of sinusitis on computed tomography (CT, we assessed the reliability, validity and responsiveness of the CSS. Results Test-retest reliability of the two CSS scales and the total scale ranged 0.87–0.92, while internal consistency reliability ranged 0.31–0.55. CSS subscale scores were associated with other items on sinusitis symptoms, and with the Mental health and Bodily pain scale of the SF-36. There was little association of the CSS scale scores with sinus CT findings. The patients with chronic sinusitis had worse scores on all three CSS scales than a healthy reference population (n = 42 (p Conclusion The Norwegian version of the CSS had acceptable test-retest reliability, but lower internal consistency reliability than the accepted standard criteria. The results support the construct validity of the measure and the sinusitis symptoms subscale and the total scales were responsive to change. This supports the use of the questionnaire in interventions for chronic sinusitis, but points at problems with the internal consistency reliability.

  20. Quality of life in the Danish general population--normative data and validity of WHOQOL-BREF using Rasch and item response theory models

    DEFF Research Database (Denmark)

    Noerholm, V; Groenvold, M; Watt, T

    2004-01-01

    , the objective of the study was to estimate the reference data for the quality of life questionnaire WHOQOL-BREF in the general Danish population and in subgroups defined by age, gender, and education. METHODS: Mail-out-mail-back questionnaires were sent to a randomly selected sample of the Danish general......BACKGROUND: The main objective of this study was to investigate the construct validity of the WHOQOL-BREF by use of Rasch and Item Response Theory models and to examine the stability of the model across high/low scoring individuals, gender, education, and depressive illness. Furthermore...

  1. Item calibration in incomplete testing designs

    NARCIS (Netherlands)

    Eggen, Theo J.H.M.; Verhelst, Norman D.

    2011-01-01

    This study discusses the justifiability of item parameter estimation in incomplete testing designs in item response theory. Marginal maximum likelihood (MML) as well as conditional maximum likelihood (CML) procedures are considered in three commonly used incomplete designs: random incomplete, multis

  2. Items of the Montgomery-Åsberg Depression Rating Scale Associated With Response to Paroxetine Treatment in Patients With Major Depressive Disorder.

    Science.gov (United States)

    Tomita, Tetsu; Sato, Yasushi; Nakagami, Taku; Tsuchimine, Shoko; Kaneda, Ayako; Kaneko, Sunao; Nakamura, Kazuhiko; Yasui-Furukori, Norio

    2016-01-01

    In the present study, we investigated the association between the severity of each symptom evaluated by the Montgomery-Åsberg Depression Rating Scale (MADRS) at baseline and responsiveness to treatment in patients with major depressive disorder (MDD) to identify the items that predict treatment response. The patients received a diagnosis of MDD if they had a score greater than 20 points on the MADRS. Following admission, 120 patients were enrolled in the study, and 89 patients completed the study. For the first week, a 20-mg/d dose of paroxetine was administered; thereafter, the dose was increased to 40 mg/d. The MADRS was applied at baseline and after 1, 2, 4, and 6 weeks. We defined responders as patients with improvements in their MADRS scores of more than 50% after 6 weeks of treatment. A multiple regression analysis of MADRS scores at 6 weeks was performed to identify patients who responded to treatment. There was a significant difference between responders and nonresponders in the reported sadness (RS) score for all MADRS items. In the multiple logistic regression analysis, only the RS and concentration difficulties (C) scores showed a significant association with treatment response. Based on the results of χ tests, RS score cutoff values of 2/3 and 3/4 revealed significant differences in the responder rate. None of the cutoff values for the C score revealed significant differences. The RS score was significantly associated with responsiveness to paroxetine treatment for MDD, with higher RS scores predicting poor responses to treatment.

  3. The impact of item-writing flaws and item complexity on examination item difficulty and discrimination value.

    Science.gov (United States)

    Rush, Bonnie R; Rankin, David C; White, Brad J

    2016-09-29

    Failure to adhere to standard item-writing guidelines may render examination questions easier or more difficult than intended. Item complexity describes the cognitive skill level required to obtain a correct answer. Higher cognitive examination items promote critical thinking and are recommended to prepare students for clinical training. This study evaluated faculty-authored examinations to determine the impact of item-writing flaws and item complexity on the difficulty and discrimination value of examination items used to assess third year veterinary students. The impact of item-writing flaws and item complexity (cognitive level I-V) on examination item difficulty and discrimination value was evaluated on 1925 examination items prepared by clinical faculty for third year veterinary students. The mean (± SE) percent correct (83.3 % ± 17.5) was consistent with target values in professional education, and the mean discrimination index (0.18 ± 0.17) was slightly lower than recommended (0.20). More than one item-writing flaw was identified in 37.3 % of questions. The most common item-writing flaws were awkward stem structure, implausible distractors, longest response is correct, and responses are series of true-false statements. Higher cognitive skills (complexity level III-IV) were required to correctly answer 38.4 % of examination items. As item complexity increased, item difficulty and discrimination values increased. The probability of writing discriminating, difficult examination items decreased when implausible distractors and all of the above were used, and increased if the distractors were comprised of a series of true/false statements. Items with four distractors were not more difficult or discriminating than items with three distractors. Preparation of examination questions targeting higher cognitive levels will increase the likelihood of constructing discriminating items. Use of implausible distractors to complete a five-option multiple choice

  4. How we know it hurts: item analysis of written narratives reveals distinct neural responses to others' physical pain and emotional suffering.

    Directory of Open Access Journals (Sweden)

    Emile Bruneau

    Full Text Available People are often called upon to witness, and to empathize with, the pain and suffering of others. In the current study, we directly compared neural responses to others' physical pain and emotional suffering by presenting participants (n = 41 with 96 verbal stories, each describing a protagonist's physical and/or emotional experience, ranging from neutral to extremely negative. A separate group of participants rated "how much physical pain", and "how much emotional suffering" the protagonist experienced in each story, as well as how "vivid and movie-like" the story was. Although ratings of Pain, Suffering and Vividness were positively correlated with each other across stories, item-analyses revealed that each scale was correlated with activity in distinct brain regions. Even within regions of the "Shared Pain network" identified using a separate data set, responses to others' physical pain and emotional suffering were distinct. More broadly, item analyses with continuous predictors provided a high-powered method for identifying brain regions associated with specific aspects of complex stimuli - like verbal descriptions of physical and emotional events.

  5. Accuracy of responses from postal surveys about continuing medical education and information behavior: experiences from a survey among German diabetologists

    Directory of Open Access Journals (Sweden)

    Trelle Sven

    2002-08-01

    Full Text Available Abstract Background Postal surveys are a popular instrument for studies about continuing medical education habits. But little is known about the accuracy of responses in such surveys. The objective of this study was to quantify the magnitude of inaccurate responses in a postal survey among physicians. Methods A sub-analysis of a questionnaire about continuing medical education habits and information management was performed. The five variables used for the quantitative analysis are based on a question about the knowledge of a fictitious technical term and on inconsistencies in contingency tables of answers to logically connected questions. Results Response rate was 52%. Non-response bias is possible but seems not very likely since an association between demographic variables and inconsistent responses could not be found. About 10% of responses were inaccurate according to the definition. Conclusion It was shown that a sub-analysis of a questionnaire makes a quantification of inaccurate responses in postal surveys possible. This sub-analysis revealed that a notable portion of responses in a postal survey about continuing medical education habits and information management was inaccurate.

  6. How Important Are High Response Rates for College Surveys?

    Science.gov (United States)

    Fosnacht, Kevin; Sarraf, Shimon; Howe, Elijah; Peck, Leah K.

    2017-01-01

    Surveys play an important role in understanding the higher education landscape. About 60 percent of the published research in major higher education journals utilized survey data (Pike, 2007). Institutions also commonly use surveys to assess student outcomes and evaluate programs, instructors, and even cafeteria food. However, declining survey…

  7. How Important Are High Response Rates for College Surveys?

    Science.gov (United States)

    Fosnacht, Kevin; Sarraf, Shimon; Howe, Elijah; Peck, Leah K.

    2017-01-01

    Surveys play an important role in understanding the higher education landscape. About 60 percent of the published research in major higher education journals utilized survey data (Pike, 2007). Institutions also commonly use surveys to assess student outcomes and evaluate programs, instructors, and even cafeteria food. However, declining survey…

  8. Self-reported responsiveness to direct-to-consumer drug advertising and medication use: results of a national survey

    Directory of Open Access Journals (Sweden)

    Somes Grant W

    2011-09-01

    Full Text Available Abstract Background Direct-to-consumer (DTC marketing of pharmaceuticals is controversial, yet effective. Little is known relating patterns of medication use to patient responsiveness to DTC. Methods We conducted a secondary analysis of data collected in national telephone survey on knowledge of and attitudes toward DTC advertisements. The survey of 1081 U.S. adults (response rate = 65% was conducted by the Food and Drug Administration (FDA. Responsiveness to DTC was defined as an affirmative response to the item: "Has an advertisement for a prescription drug ever caused you to ask a doctor about a medical condition or illness of your own that you had not talked to a doctor about before?" Patients reported number of prescription and over-the-counter (OTC medicines taken as well as demographic and personal health information. Results Of 771 respondents who met study criteria, 195 (25% were responsive to DTC. Only 7% respondents taking no prescription were responsive, whereas 45% of respondents taking 5 or more prescription medications were responsive. This trend remained significant (p trend .0009 even when controlling for age, gender, race, educational attainment, income, self-reported health status, and whether respondents "liked" DTC advertising. There was no relationship between the number of OTC medications taken and the propensity to discuss health-related problems in response to DTC advertisements (p = .4. Conclusion There is a strong cross-sectional relationship between the number of prescription, but not OTC, drugs used and responsiveness to DTC advertising. Although this relationship could be explained by physician compliance with patient requests for medications, it is also plausible that DTC advertisements have a particular appeal to patients prone to taking multiple medications. Outpatients motivated to discuss medical conditions based on their exposure to DTC advertising may require a careful medication history to evaluate for

  9. Generalized Full-Information Item Bifactor Analysis

    Science.gov (United States)

    Cai, Li; Yang, Ji Seung; Hansen, Mark

    2011-01-01

    Full-information item bifactor analysis is an important statistical method in psychological and educational measurement. Current methods are limited to single-group analysis and inflexible in the types of item response models supported. We propose a flexible multiple-group item bifactor analysis framework that supports a variety of…

  10. The 12-item medical outcomes study short form health survey version 2.0 (SF-12v2: a population-based validation study from Tehran, Iran

    Directory of Open Access Journals (Sweden)

    Omidvari Sepideh

    2011-03-01

    Full Text Available Abstract Background The SF-12v2 is the improved version of the SF-12v1. This study aimed to validate the SF-12v2 in Iran. Methods A random sample of the general population aged 18 years and over living in Tehran, Iran completed the instrument. Reliability was estimated using internal consistency and validity was assessed using known-groups comparison and convergent validity. In addition the factor structure of the questionnaire was extracted by performing both exploratory and confirmatory factor analyses (EFA and CFA. Results In all, 3685 individuals were studied (1887male and 1798 female. Internal consistency for both summary measures was satisfactory. Cronbach's α for the Physical Component Summary (PCS-12 was 0.87 and for the Mental Component Summary (MCS-12 it was 0.82. Known-groups comparison showed that the SF-12v2 discriminated well between men and women and those who differed in age and educational status (P Conclusion Although the findings could not be generalized to the Iranian population, overall the findings suggest that the SF-12v2 is a reliable and valid measure of health related quality of life among Iranians and now could be used in future health outcome studies. However, further studies are recommended to establish its stability, responsiveness to change, and concurrent validity for this health survey in Iran.

  11. Health system responsiveness and chronic disease care - What is the role of disease management programs? An analysis based on cross-sectional survey and administrative claims data.

    Science.gov (United States)

    Röttger, Julia; Blümel, Miriam; Linder, Roland; Busse, Reinhard

    2017-07-01

    Health system responsiveness is an important aspect of health systems performance. The concept of responsiveness relates to the interpersonal and contextual aspects of health care. While disease management programs (DMPs) aim to improve the quality of health care (e.g. by improving the coordination of care), it has not been analyzed yet whether these programs improve the perceived health system responsiveness. Our study aims to close this gap by analyzing the differences in the perceived health system responsiveness between DMP-participants and non-participants. We used linked survey- and administrative claims data from 7037 patients with coronary heart disease in Germany. Of those, 5082 were enrolled and 1955 were not enrolled in the DMP. Responsiveness was assessed with an adapted version of the WHO responsiveness questionnaire in a postal survey in 2013. The survey covered 9 dimensions of responsiveness and included 17 items for each, GP and specialist care. Each item had five answer categories (very good - very bad). We handled missing values in the covariates by multiple imputation and applied propensity score matching (PSM) to control for differences between the two groups (DMP/non-DMP). We used Wilcoxon-signed-rank and McNemar test to analyze differences regarding the reported responsiveness. The PSM led to a matched and well balanced sample of 1921 pairs. Overall, DMP-participants rated the responsiveness of care more positive. The main difference was found for the coordination of care at the GP, with 62.0% of 1703 non-participants reporting a "good" or "very good" experience, compared to 69.1% of 1703 participants (p < 0.001). The results of our study indicate an overall high responsiveness for CHD-care, as well for DMP-participants as for non-participants. Yet, the results also clearly indicate that there is still a need to improve the coordination of care. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Determining Possible Professionals and Respective Roles and Responsibilities for a Model Comprehensive Elder Abuse Intervention: A Delphi Consensus Survey.

    Directory of Open Access Journals (Sweden)

    Janice Du Mont

    Full Text Available We have undertaken a multi-phase, multi-method program of research to develop, implement, and evaluate a comprehensive hospital-based nurse examiner elder abuse intervention that addresses the complex functional, social, forensic, and medical needs of older women and men. In this study, we determined the importance of possible participating professionals and respective roles and responsibilities within the intervention.Using a modified Delphi methodology, recommended professionals and their associated roles and responsibilities were generated from a systematic scoping review of relevant scholarly and grey literatures. These items were reviewed, new items added for review, and rated/re-rated for their importance to the intervention on a 5-point Likert scale by an expert panel during a one day in-person meeting. Items that did not achieve consensus were subsequently re-rated in an online survey.Those items that achieved a mean Likert rating of 4+ (rated important to very important, and an interquartile range<1 in the first or second round, and/or for which 80% of ratings were 4+ in the second round were retained for the model elder abuse intervention.Twenty-two of 31 recommended professionals and 192 of 229 recommended roles and responsibilities rated were retained for our model elder abuse intervention. Retained professionals were: public guardian and trustee (mean rating = 4.88, geriatrician (4.87, police officer (4.87, GEM (geriatric emergency management nurse (4.80, GEM social worker (4.78, community health worker (4.76, social worker/counsellor (4.74, family physician in community (4.71, paramedic (4.65, financial worker (4.59, lawyer (4.59, pharmacist (4.59, emergency physician (4.57, geriatric psychiatrist (4.33, occupational therapist (4.29, family physician in hospital (4.28, Crown prosecutor (4.24, neuropsychologist (4.24, bioethicist (4.18, caregiver advocate (4.18, victim support worker (4.18, and respite care worker (4.12.A large and

  13. 2012 Workplace and Gender Relations Survey of Active Duty Members. Tabulations of Responses

    Science.gov (United States)

    2013-04-01

    enjoyment , and the opportunity to acquire valuable skills .................................................. 96 15. Overall, how well prepared...538 87. Suppose you see a Service member, who you do not know very well, getting drunk at a party. Someone tells you that one of...The composite measure includes survey items on sense of pride, use of skills, work enjoyment , and the opportunity to acquire valuable skills (Q14a-d

  14. 教育考试中短测验的分析方法——基于两种项目反应理论方法的比较研究%Item Analysis of Short Test in Educational Testing: Comparative Study on Parameter and Non-parameter Item Response Theory

    Institute of Scientific and Technical Information of China (English)

    何壮; 袁淑莉; 赵守盈

    2012-01-01

    教育考试中专题、短测验等形式是命题的一种主要方式。对这类测验的分析,可以从参数项目反应理论和非参数项目反应理论入手。本研究分别选取Rasch模型和Mokken模型对某高三文科综合地理试卷进行分析比较。使用winsteps和xeaaibre软件进行Rasch分析,得到难度、信息量、项目功能差异等参数;使用MSP软件进行Mokken分析,得到正答率和同质性系数。比较两种结果,得出以下结论:(1)非参数项目反应理论以正答率对题目排序与参数项目反应理论以难度排序一致;(2)而有个别不符合参数项目反应理论标准的题目对提高测验质量同样有意义,不应被删除;(3)进行维度检验和题目筛选时,非参数项目反应理论标准比参数项目反应理论标准更加严格;(4)两种理论的项目功能差异检验结果一致。%As one of the significant types of tests, the test project and short test are popular in educational testing. Parameter and non-parameter item response theory being the starts, these tests were under analysis. Compared was the geography paper in inaugurated arts taken by some senior three students. During this comparison the Rasch and Mokken model were respectively selected. For analyzing software Winsteps and Xcalibre were utilized to analyze item parameters in Rasch model. Analyzed in detail were the parameters of difficulty, differential item functioning and information curve. Software MSP was for the purpose of analyzing items in Mokken model. Besides, the statistics of accurate rate and coefficients of homogeneity were also analyzed in detail. Finally, four conclusions were arrived at as the following: ( 1 ) The estimate results of difficulty between non-parameter and parameter item response theory were equivalent. (2)Those items, which failed to fit parameter item response theory, succeeded in non-parameter item response theory. (3)Non-parameter item

  15. Alternate item types: continuing the quest for authentic testing.

    Science.gov (United States)

    Wendt, Anne; Kenny, Lorraine E

    2009-03-01

    Many test developers suggest that multiple-choice items can be used to evaluate critical thinking if the items are focused on measuring higher order thinking ability. The literature supports the use of alternate item types to assess additional competencies, such as higher level cognitive processing and critical thinking, as well as ways to allow examinees to demonstrate their competencies differently. This research study surveyed nurses after taking a test composed of alternate item types paired with multiple-choice items. The participants were asked to provide opinions regarding the items and the item formats. Demographic information was asked. In addition, information was collected as the participants responded to the items. The results of this study reveal that the participants thought that, in general, the items were more authentic and allowed them to demonstrate their competence better than multiple-choice items did. Further investigation into the optimal blend of alternate items and multiple-choice items is needed.

  16. Can Lottery Incentives Boost Web Survey Response Rates? Findings from Four Experiments

    Science.gov (United States)

    Laguilles, Jerold S.; Williams, Elizabeth A.; Saunders, Daniel B.

    2011-01-01

    Institutions of higher education rely on student surveys for a number of purposes, including planning, assessment, and research. Web surveys are especially prevalent given their ease of use and low-cost; yet, obtaining a high response rate is a challenge. Although researchers have investigated the use of incentives in traditional mail surveys,…

  17. Evaluating Reasons for Low Response from Mail Surveys. AIR 1995 Annual Forum Paper.

    Science.gov (United States)

    Westcott, S. Wickes, III; And Others

    A study was undertaken to solicit opinions from alumni on methods that might improve responses from graduate surveys. Two telephone surveys were conducted, one in 1991 which targeted the graduating classes of 1984 and 1989, and the second in 1994 among alumni of the classes of 1991 and 1993. In the 1994 survey information was gathered regarding…

  18. Grouping of Items in Mobile Web Questionnaires

    Science.gov (United States)

    Mavletova, Aigul; Couper, Mick P.

    2016-01-01

    There is some evidence that a scrolling design may reduce breakoffs in mobile web surveys compared to a paging design, but there is little empirical evidence to guide the choice of the optimal number of items per page. We investigate the effect of the number of items presented on a page on data quality in two types of questionnaires: with or…

  19. Surveys of Health Professions Trainees: Prevalence, Response Rates, and Predictive Factors to Guide Researchers.

    Science.gov (United States)

    Phillips, Andrew W; Friedman, Benjamin T; Utrankar, Amol; Ta, Andrew Q; Reddy, Shalini T; Durning, Steven J

    2017-02-01

    To establish a baseline overall response rate for surveys of health professions trainees, determine strategies associated with improved response rates, and evaluate for the presence of nonresponse bias. The authors performed a comprehensive analysis of all articles published in Academic Medicine, Medical Education, and Advances in Health Sciences Education in 2013, recording response rates. Additionally, they reviewed nonresponse bias analyses and factors suggested in other fields to affect response rate including survey delivery method, prenotification, and incentives. The search yielded 732 total articles; of these, 356 were research articles, and of these, 185 (52.0%) used at least one survey. Of these, 66 articles (35.6%) met inclusion criteria and yielded 73 unique surveys. Of the 73 surveys used, investigators reported a response rate for 63.0% of them; response rates ranged from 26.6% to 100%, mean (standard deviation) 71.3% (19.5%). Investigators reported using incentives for only 16.4% of the 73 surveys. The only survey methodology factor significantly associated with response rate was single- vs. multi-institutional surveys (respectively, 74.6% [21.2%] vs. 62.0% [12.8%], P = .022). Notably, statistical power for all analyses was limited. No articles evaluated for nonresponse bias. Approximately half of the articles evaluated used a survey as part of their methods. Limited data are available to establish a baseline response rate among health professions trainees and inform researchers which strategies are associated with higher response rates. Journals publishing survey-based health professions education research should improve reporting of response rate, nonresponse bias, and other survey factors.

  20. Faculty perspectives of the undergraduate laboratory: A survey of faculty goals for the laboratory and comparative analysis of responses using statistical techniques

    Science.gov (United States)

    Bruck, Aaron D.

    Qualitative research methods were used in a previous study to discover the goals of faculty members teaching undergraduate laboratories. Assertions about the goals and the unique characteristics of innovative lab programs were developed from categories that emerged from the interviews. The purpose of the present research was to create a survey instrument to measure the prevalence of these themes and faculty goals for undergraduate laboratories with a national sample. This was achieved through a two-stage process that utilized a pilot survey to determine the factor structure and reduce the number of survey items to a manageable size. Once the number of survey questions was reduced, the full survey was given to a national sample of undergraduate laboratory faculty. The 312 responses to the survey were then analyzed using factor analysis. Comparative analyses were conducted using analysis of variance (ANOVA). This dissertation focuses on the processes involved in the creation of this survey and the subsequent analyses of the data the survey produced. The results of these analyses and the implications of this research will also be discussed.

  1. Screening Test Items for Differential Item Functioning

    Science.gov (United States)

    Longford, Nicholas T.

    2014-01-01

    A method for medical screening is adapted to differential item functioning (DIF). Its essential elements are explicit declarations of the level of DIF that is acceptable and of the loss function that quantifies the consequences of the two kinds of inappropriate classification of an item. Instead of a single level and a single function, sets of…

  2. DSM-5 alternative personality disorder model traits as maladaptive extreme variants of the five-factor model: An item-response theory analysis.

    Science.gov (United States)

    Suzuki, Takakuni; Samuel, Douglas B; Pahlen, Shandell; Krueger, Robert F

    2015-05-01

    Over the past two decades, evidence has suggested that personality disorders (PDs) can be conceptualized as extreme, maladaptive variants of general personality dimensions, rather than discrete categorical entities. Recognizing this literature, the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) alternative PD model in Section III defines PDs partially through 25 maladaptive traits that fall within 5 domains. Empirical evidence based on the self-report measure of these traits, the Personality Inventory for DSM-5 (PID-5), suggests that these five higher-order domains share a structure and correlate in meaningful ways with the five-factor model (FFM) of general personality. In the current study, item response theory was used to compare the DSM-5 alternative PD model traits to those from a normative FFM inventory (the International Personality Item Pool-NEO [IPIP-NEO]) in terms of their measurement precision along the latent dimensions. Within a combined sample of 3,517 participants, results strongly supported the conclusion that the DSM-5 alternative PD model traits and IPIP-NEO traits are complimentary measures of 4 of the 5 FFM domains (with perhaps the exception of openness to experience vs. psychoticism). Importantly, the two measures yield largely overlapping information curves on these four domains. Differences that did emerge suggested that the PID-5 scales generally have higher thresholds and provide more information at the upper levels, whereas the IPIP-NEO generally had an advantage at the lower levels. These results support the general conceptualization that 4 domains of the DSM-5 alternative PD model traits are maladaptive, extreme versions of the FFM. (PsycINFO Database Record

  3. Using a retrospective pretest instead of a conventional pretest is replacing biases: a qualitative study of cognitive processes underlying responses to thentest items.

    Science.gov (United States)

    Taminiau-Bloem, Elsbeth F; Schwartz, Carolyn E; van Zuuren, Florence J; Koeneman, Margot A; Visser, Mechteld R M; Tishelman, Carol; Koning, Caro C E; Sprangers, Mirjam A G

    2016-06-01

    The thentest design aims to detect and control for recalibration response shift. This design assumes (1) more consistency in the content of the cognitive processes underlying patients' quality of life (QoL) between posttest and thentest assessments than between posttest and pretest assessments; and (2) consistency in the time frame and description of functioning referenced at pretest and thentest. Our objective is to utilize cognitive interviewing to qualitatively examine both assumptions. We conducted think-aloud interviews with 24 patients with cancer prior to and after radiotherapy to elicit cognitive processes underlying their assessment of seven EORTC QLQ-C30 items at pretest, posttest and thentest. We used an analytic scheme based on the cognitive process models of Tourangeau et al. and Rapkin and Schwartz that yielded five cognitive processes. We subsequently used this input for quantitative analysis of count data. Contrary to expectation, the number of dissimilar cognitive processes between posttest and thentest was generally larger than between pretest and posttest across patients. Further, patients considered a range of time frames when answering the thentest questions. Moreover, patients' description at the thentest of their pretest functioning was often not similar to that which was noted at pretest. Items referring to trouble taking a short walk, overall health and QoL were most often violating the assumptions. Both assumptions underlying the thentest design appear not to be supported by the patients' cognitive processes. Replacing the conventional pretest-posttest design with the thentest design may simply be replacing one set of biases with another.

  4. The Impact of Lottery Incentives on Student Survey Response Rates.

    Science.gov (United States)

    Porter, Stephen R.; Whitcomb, Michael E.

    2003-01-01

    A controlled experiment tested the effects of lottery incentives using a prospective college applicant Web survey, with emails sent to more than 9,000 high school students. Found minimal effect of postpaid incentives for increasing levels of incentive. (EV)

  5. Motivation in Business Survey Response Behavior : Influencing motivation to improve survey outcome

    NARCIS (Netherlands)

    Torres van Grinsven, V.

    2015-01-01

    In this dissertation we show theoretical and empirical insights into the concept of motivation in the context of the business and organizational survey task. The research has led to a number of recommendations on how to improve organizational survey and communication design to enhance motivation and

  6. Motivation in Business Survey Response Behavior : Influencing motivation to improve survey outcome

    NARCIS (Netherlands)

    Torres van Grinsven, V.|info:eu-repo/dai/nl/355608510

    2015-01-01

    In this dissertation we show theoretical and empirical insights into the concept of motivation in the context of the business and organizational survey task. The research has led to a number of recommendations on how to improve organizational survey and communication design to enhance motivation and

  7. Principles and procedures of considering item sequence effects in the development of calibrated item pools: Conceptual analysis and empirical illustration

    Directory of Open Access Journals (Sweden)

    Safir Yousfi

    2012-12-01

    Full Text Available Item responses can be context-sensitive. Consequently, composing test forms flexibly from a calibrated item pool requires considering potential context effects. This paper focuses on context effects that are related to the item sequence. It is argued that sequence effects are not necessarily a violation of item response theory but that item response theory offers a powerful tool to analyze them. If sequence effects are substantial, test forms cannot be composed flexibly on the basis of a calibrated item pool, which precludes applications like computerized adaptive testing. In contrast, minor sequence effects do not thwart applications of calibrated item pools. Strategies to minimize the detrimental impact of sequence effects on item parameters are discussed and integrated into a nomenclature that addresses the major features of item calibration designs. An example of an item calibration design demonstrates how this nomenclature can guide the process of developing a calibrated item pool.

  8. 艾森克人格问卷项目质量的项目反应理论分析%AN ITEM ANALYSIS OF EPQ ON THE ITEM RESPONSE THEORY

    Institute of Scientific and Technical Information of China (English)

    杨建原; 何壮; 赵守盈

    2012-01-01

    It is 30 years since Eysenek' s personality (EPQ) theory was first introduced to China, and the latest norm was published in 2000. Exposed too much in the past 10 years, its applicability needs to be tested empirically again with present samples. The aim of this paper is to analyze EPQ' s items' properties under IRT, focusing on the measurement accuracy of the items. The program MULTILOG 7. 03 was employed as the tool for parameter estimation. In the estimating procedure, maximum likelihood estimation and two parameter logistic model were utilized. The parameters of difficulty, discrimination and information curve were analyzed in detail. The results indicated that the data accorded with the basic assumptions in IRT; unidimensionality, monotone increasing and invariance of parameter estimation etc. The difficulty and discrimination of most of the EPQ' s items met the theoretical requirements, which demonstrated the revision of the questionnaire was quite successful; many erroneous judgments, nevertheless, should be aroused if it is applied to make assessments or interventions to the subject due to its limited amount of information obtained from the subject. As an essential complement to CTT to analyze items, IRT will be widely used in psychological and education test studies in the future.%使用MULTILOG 7.03软件的边际极大似然估计法,选取双参数Logistic模型对某大学2011级新生的EPQ测试数据进行分析,针对项目区分度、项目难度、信息量等指标对各项目及各分量表进行深入探讨.结果显示:数据符合项目反应理论的基本假设;多数项目的区分度、难度达到理论要求.但E、P、N三个分测验在划界分数点上得到的信息量有限,难以对被试做出良好的区分;三个分测验各自的总信息量未达到理论要求.

  9. Reexamining traditional issues in survey research: Just how evil is the anathema of low response rate?

    Energy Technology Data Exchange (ETDEWEB)

    Clark, S.B. [Oak Ridge Institute for Science and Education, TN (United States). Science/Engineering Education Division; Boser, J.A. [Univ. of Tennessee, Knoxville, TN (United States)

    1995-08-01

    Survey researchers have long been exhorted to strive for high response rates in order to maximize the likelihood that the respondents are representative of the population being surveyed. It is not surprising then, that much survey research has been directed towards examining the effects of various manipulatable factors on response rate. It is clear that attempts to reach the goal of minimizing the likelihood of nonresponse bias through testing various methods of increasing survey response rates have consumed much research and debate. The results obtained in this research have been inconsistent. Some studies have found significant differences, others have found none. The present study was designed to determine the extent to which the results of an employment survey of former graduates of a teacher preparation program would have been affected by changes in response rate.

  10. Quality of life and discriminating power of two questionnaires in fibromyalgia patients: fibromyalgia Impact Questionnaire and Medical Outcomes Study 36-Item Short-Form Health Survey A qualidade de vida e o poder de discriminação de dois questionários em pacientes com fibromialgia: fibromyalgia Impact Questionnaire e Medical Outcomes Study 36-Item Short-Form Health Survey

    Directory of Open Access Journals (Sweden)

    Ana Assumpção

    2010-08-01

    Full Text Available BACKGROUND: Fibromyalgia is a painful syndrome characterized by widespread chronic pain and associated symptoms with a negative impact on quality of life. OBJECTIVES: Considering the subjectivity of quality of life measurements, the aim of this study was to verify the discriminating power of two quality of life questionnaires in patients with fibromyalgia: the generic Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36 and the specific Fibromyalgia Impact Questionnaire (FIQ. METHODS: A cross-sectional study was conducted on 150 participants divided into Fibromyalgia Group (FG and Control Group (CG (n=75 in each group. The participants were evaluated using the SF-36 and the FIQ. The data were analyzed by the Student t-test (α=0.05 and inferential analysis using the Receiver Operating Characteristics (ROC Curve - sensitivity, specificity and area under the curve (AUC. The significance level was 0.05. RESULTS: The sample was similar for age (CG: 47.8±8.1; FG: 47.0±7.7 years. A significant difference was observed in quality of life assessment in all aspects of both questionnaires (pCONTEXTUALIZAÇÃO: A fibromialgia é uma síndrome dolorosa caracterizada por dor espalhada e crônica e sintomas associados com um impacto negativo na qualidade de vida. OBJETIVOS: Considerando a subjetividade da mensuração de qualidade de vida, o objetivo deste estudo foi avaliar o poder de discriminação de dois questionários que avaliam a qualidade de vida de pacientes com fibromialgia: o genérico Medical Short Form Healthy Survey (SF-36 e o específico Questionário do Impacto da Fibromialgia (QIF. MÉTODOS: Foi conduzido um estudo transversal com 150 indivíduos, divididos em dois grupos: grupo fibromialgia (FM e grupo controle (GC (n=75 em ambos. Os pacientes foram avaliados pelo SF-36 e pelo QIF. Na análise dos dados, utilizou-se o teste "t de Student" com α=0,05 e a Curva ROC (Receiver Operating Characteristics Curve. RESULTADOS: As amostras

  11. The effect of multiple reminders on response patterns in a Danish health survey

    DEFF Research Database (Denmark)

    Christensen, Anne I; Ekholm, Ola; Kristensen, Peter L;

    2015-01-01

    BACKGROUND: Reminders are routinely applied in surveys to increase response rates and reduce the possibility of bias. This study examines the effect of multiple reminders on the response rate, non-response bias, prevalence estimates and exposure-outcome relations in a national self......-administered health survey. METHODS: Data derive from the Danish National Health Survey 2010, in which 298 550 individuals (16 years of age or older) were invited to participate in a cross-sectional survey using a mixed-mode approach (paper and web questionnaires). At least two reminders were sent to non-respondents......, and 177 639 individuals completed the questionnaire (59.5%). Response patterns were compared between four groups of individuals (first mailing respondents, second mailing respondents, third mailing respondents and non-respondents). RESULTS: Multiple reminders led to an increase in response rate from 36...

  12. [Response rates in three opinion surveys performed through online questionnaires in the health setting].

    Science.gov (United States)

    Aerny Perreten, Nicole; Domínguez-Berjón, Ma Felicitas; Astray Mochales, Jenaro; Esteban-Vasallo, María D; Blanco Ancos, Luis Miguel; Lópaz Pérez, Ma Ángeles

    2012-01-01

    The main advantages of online questionnaires are the speed of data collection and cost savings, but response rates are usually low. This study analyzed response rates and associated factors among health professionals in three opinion surveys in the autonomous region of Madrid. The participants, length of the questionnaire and topic differed among the three surveys. The surveys were conducted by using paid Internet software. The institutional e-mail addresses of distinct groups of health professionals were used. Response rates were highest in hospitals (up to 63%) and administrative services and were lowest in primary care (less than 33%). The differences in response rates were analyzed in primary care professionals according to age, sex and professional category and only the association with age was statistically significant. None of the surveys achieved a response rate of 60%. Differences were observed according to workplace, patterns of Internet usage, and interest in the subject. Copyright © 2011 SESPAS. Published by Elsevier Espana. All rights reserved.

  13. Invariance Testing of the SF-36 Health Survey in Women Breast Cancer Survivors: Do Personal and Cancer-Related Variables Influence the Meaning of Quality of Life Items?

    Science.gov (United States)

    Mosewich, Amber D.; Hadd, Valerie; Crocker, Peter R. E.; Zumbo, Bruno D.

    2013-01-01

    Quality of life (QoL) is affected by issues specific to illness trajectory and thus, may differ, and potentially take on different meanings, at different stages in the cancer process. A widely used measure of QoL is the SF-36 Health Survey (SF-36; Ware 1993); therefore, support for its appropriateness in a given population is imperative. The…

  14. Invariance Testing of the SF-36 Health Survey in Women Breast Cancer Survivors: Do Personal and Cancer-Related Variables Influence the Meaning of Quality of Life Items?

    Science.gov (United States)

    Mosewich, Amber D.; Hadd, Valerie; Crocker, Peter R. E.; Zumbo, Bruno D.

    2013-01-01

    Quality of life (QoL) is affected by issues specific to illness trajectory and thus, may differ, and potentially take on different meanings, at different stages in the cancer process. A widely used measure of QoL is the SF-36 Health Survey (SF-36; Ware 1993); therefore, support for its appropriateness in a given population is imperative. The…

  15. How to compare scores from different depression scales: equating the Patient Health Questionnaire (PHQ) and the ICD-10-Symptom Rating (ISR) using Item Response Theory.

    Science.gov (United States)

    Fischer, H Felix; Tritt, Karin; Klapp, Burghard F; Fliege, Herbert

    2011-12-01

    A wide range of questionnaires for measuring depression are available. Item Response Theory models can help to evaluate the questionnaires exceeding the boundaries of Classical Test Theory and provide an opportunity to equate the questionnaires. In this study after checking for unidimensionality, a General Partial Credit Model was applied to data from two different depression scales [Patient Health Questionnaire (PHQ-9) and ICD-10-Symptom Rating (ISR)] obtained in clinical settings from a consecutive sample, including 4517 observations from a total of 2999 inpatients and outpatients of a psychosomatic clinic. The precision of each questionnaire was compared and the model was used to transform scores based on the assumed underlying latent trait. Both instruments were constructed to measure the same construct and their estimates of depression severity are highly correlated. Our analysis showed that the predicted scores provided by the conversion tables are similar to the observed scores in a validation sample. The PHQ-9 and ISR depression scales measure depression severity across a broad range with similar precision. While the PHQ-9 shows advantages in measuring low or high depression severity, the ISR is more parsimonious and also suitable for clinical purposes. Furthermore, the equation tables derived in this study enhance the comparability of studies using either one of the instruments, but due to substantial statistical spread the comparison of individual scores is imprecise.

  16. The Professional Context as a Predictor for Response Distortion in the Adaption-Innovation Inventory--An Investigation Using Mixture Distribution Item Response Theory Models

    Science.gov (United States)

    Fischer, Sebastian; Freund, Philipp Alexander

    2014-01-01

    The Adaption-Innovation Inventory (AII), originally developed by Kirton (1976), is a widely used self-report instrument for measuring problem-solving styles at work. The present study investigates how scores on the AII are affected by different response styles. Data are collected from a combined sample (N = 738) of students, employees, and…

  17. Understanding Low Survey Response Rates Among Young U.S. Military Personnel

    Science.gov (United States)

    2015-01-01

    two years of the RAND survey. We were unable to include some surveys, such as the 2010 Navy Pregnancy and Parenthood Survey, because response rates...stratified random sampling approach, oversampling women (including sampling all women in the Marine Corps) and oversampling men in the Marine Corps (DMDC...during the first week of basic military training, every Air Force recruit completed a behavioral risk questionnaire on such topics as smoking , alcohol use

  18. 2012 Survey of Active Duty Spouses: Tabulations of Responses

    Science.gov (United States)

    2013-09-30

    unit ................. 88 2012 Survey of Active Duty Spouses DMDC v k. My work unit produces high quality products and services... telework preference. 3. Permanent Change of Station (PCS) Moves—Number of spouse moves, length of time since most recent PCS move, length of time...you agree or disagree with the following statements about your workplace? k. My work unit produces high quality products and services. 1. Strongly

  19. Reversed item bias: an integrative model.

    Science.gov (United States)

    Weijters, Bert; Baumgartner, Hans; Schillewaert, Niels

    2013-09-01

    In the recent methodological literature, various models have been proposed to account for the phenomenon that reversed items (defined as items for which respondents' scores have to be recoded in order to make the direction of keying consistent across all items) tend to lead to problematic responses. In this article we propose an integrative conceptualization of three important sources of reversed item method bias (acquiescence, careless responding, and confirmation bias) and specify a multisample confirmatory factor analysis model with 2 method factors to empirically test the hypothesized mechanisms, using explicit measures of acquiescence and carelessness and experimentally manipulated versions of a questionnaire that varies 3 item arrangements and the keying direction of the first item measuring the focal construct. We explain the mechanisms, review prior attempts to model reversed item bias, present our new model, and apply it to responses to a 4-item self-esteem scale (N = 306) and the 6-item Revised Life Orientation Test (N = 595). Based on the literature review and the empirical results, we formulate recommendations on how to use reversed items in questionnaires.

  20. Item response drift in the Family Affluence Scale: A study on three consecutive surveys of the Health Behaviour in School-aged Children (HBSC) survey

    DEFF Research Database (Denmark)

    Schnohr, Christina Warrer; Makransky, Guido; Kreiner, Svend

    2013-01-01

    Comparable data on socio-economic position (SEP) is essential to international studies on health inequalities. The Health Behaviour in School-aged Children (HBSC) has used the Family Affluence Scale (FAS) on material assets. The present study used data collected from adolescents in eight countries...