WorldWideScience

Sample records for single test item

  1. Evolution of a Test Item

    Science.gov (United States)

    Spaan, Mary

    2007-01-01

    This article follows the development of test items (see "Language Assessment Quarterly", Volume 3 Issue 1, pp. 71-79 for the article "Test and Item Specifications Development"), beginning with a review of test and item specifications, then proceeding to writing and editing of items, pretesting and analysis, and finally selection of an item for a…

  2. Selecting Items for Criterion-Referenced Tests.

    Science.gov (United States)

    Mellenbergh, Gideon J.; van der Linden, Wim J.

    1982-01-01

    Three item selection methods for criterion-referenced tests are examined: the classical theory of item difficulty and item-test correlation; the latent trait theory of item characteristic curves; and a decision-theoretic approach for optimal item selection. Item contribution to the standardized expected utility of mastery testing is discussed. (CM)

  3. Item Analysis in Introductory Economics Testing.

    Science.gov (United States)

    Tinari, Frank D.

    1979-01-01

    Computerized analysis of multiple choice test items is explained. Examples of item analysis applications in the introductory economics course are discussed with respect to three objectives: to evaluate learning; to improve test items; and to help improve classroom instruction. Problems, costs and benefits of the procedures are identified. (JMD)

  4. Single-item memory, associative memory, and the human hippocampus

    OpenAIRE

    Gold, Jeffrey J.; Hopkins, Ramona O.; Squire, Larry R.

    2006-01-01

    We tested recognition memory for items and associations in memory-impaired patients with bilateral lesions thought to be limited to the hippocampal region. In Experiment 1 (Combined memory test), participants studied words and then took a memory test in which studied words, new words, studied word pairs, and recombined word pairs were presented in a mixed order. In Experiment 2 (Separated memory test), participants studied single words and then took a memory test involving studied word and ne...

  5. Binomial test models and item difficulty

    NARCIS (Netherlands)

    van der Linden, Willem J.

    1979-01-01

    In choosing a binomial test model, it is important to know exactly what conditions are imposed on item difficulty. In this paper these conditions are examined for both a deterministic and a stochastic conception of item responses. It appears that they are more restrictive than is generally

  6. Algorithmic test design using classical item parameters

    NARCIS (Netherlands)

    van der Linden, Willem J.; Adema, Jos J.

    Two optimalization models for the construction of tests with a maximal value of coefficient alpha are given. Both models have a linear form and can be solved by using a branch-and-bound algorithm. The first model assumes an item bank calibrated under the Rasch model and can be used, for instance,

  7. Bayesian item selection criteria for adaptive testing

    NARCIS (Netherlands)

    van der Linden, Willem J.

    1996-01-01

    R.J. Owen (1975) proposed an approximate empirical Bayes procedure for item selection in adaptive testing. The procedure replaces the true posterior by a normal approximation with closed-form expressions for its first two moments. This approximation was necessary to minimize the computational

  8. Item calibration in incomplete testing designs

    Directory of Open Access Journals (Sweden)

    Norman D. Verhelst

    2011-01-01

    Full Text Available This study discusses the justifiability of item parameter estimation in incomplete testing designs in item response theory. Marginal maximum likelihood (MML as well as conditional maximum likelihood (CML procedures are considered in three commonly used incomplete designs: random incomplete, multistage testing and targeted testing designs. Mislevy and Sheenan (1989 have shown that in incomplete designs the justifiability of MML can be deduced from Rubin's (1976 general theory on inference in the presence of missing data. Their results are recapitulated and extended for more situations. In this study it is shown that for CML estimation the justification must be established in an alternative way, by considering the neglected part of the complete likelihood. The problems with incomplete designs are not generally recognized in practical situations. This is due to the stochastic nature of the incomplete designs which is not taken into account in standard computer algorithms. For that reason, incorrect uses of standard MML- and CML-algorithms are discussed.

  9. Using automatic item generation to create multiple-choice test items.

    Science.gov (United States)

    Gierl, Mark J; Lai, Hollis; Turner, Simon R

    2012-08-01

    Many tests of medical knowledge, from the undergraduate level to the level of certification and licensure, contain multiple-choice items. Although these are efficient in measuring examinees' knowledge and skills across diverse content areas, multiple-choice items are time-consuming and expensive to create. Changes in student assessment brought about by new forms of computer-based testing have created the demand for large numbers of multiple-choice items. Our current approaches to item development cannot meet this demand. We present a methodology for developing multiple-choice items based on automatic item generation (AIG) concepts and procedures. We describe a three-stage approach to AIG and we illustrate this approach by generating multiple-choice items for a medical licensure test in the content area of surgery. To generate multiple-choice items, our method requires a three-stage process. Firstly, a cognitive model is created by content specialists. Secondly, item models are developed using the content from the cognitive model. Thirdly, items are generated from the item models using computer software. Using this methodology, we generated 1248 multiple-choice items from one item model. Automatic item generation is a process that involves using models to generate items using computer technology. With our method, content specialists identify and structure the content for the test items, and computer technology systematically combines the content to generate new test items. By combining these outcomes, items can be generated automatically. © Blackwell Publishing Ltd 2012.

  10. Instructional Topics in Educational Measurement (ITEMS) Module: Using Automated Processes to Generate Test Items

    Science.gov (United States)

    Gierl, Mark J.; Lai, Hollis

    2013-01-01

    Changes to the design and development of our educational assessments are resulting in the unprecedented demand for a large and continuous supply of content-specific test items. One way to address this growing demand is with automatic item generation (AIG). AIG is the process of using item models to generate test items with the aid of computer…

  11. The development of a single-item Food Choice Questionnaire

    NARCIS (Netherlands)

    Onwezen, M.C.; Reinders, M.J.; Verain, M.C.D.; Snoek, H.M.

    2019-01-01

    Based on the multi-item Food Choice Questionnaire (FCQ) originally developed by Steptoe and colleagues (1995), the current study developed a single-item FCQ that provides an acceptable balance between practical needs and psychometric concerns. Studies 1 (N = 1851) and 2 (2a (N = 3290), 2b (N =

  12. The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory

    Science.gov (United States)

    Sahin, Alper; Anil, Duygu

    2017-01-01

    This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…

  13. ACER Chemistry Test Item Collection. ACER Chemtic Year 12.

    Science.gov (United States)

    Australian Council for Educational Research, Hawthorn.

    The chemistry test item banks contains 225 multiple-choice questions suitable for diagnostic and achievement testing; a three-page teacher's guide; answer key with item facilities; an answer sheet; and a 45-item sample achievement test. Although written for the new grade 12 chemistry course in Victoria, Australia, the items are widely applicable.…

  14. Evaluation of Northwest University, Kano Post-UTME Test Items Using Item Response Theory

    Science.gov (United States)

    Bichi, Ado Abdu; Hafiz, Hadiza; Bello, Samira Abdullahi

    2016-01-01

    High-stakes testing is used for the purposes of providing results that have important consequences. Validity is the cornerstone upon which all measurement systems are built. This study applied the Item Response Theory principles to analyse Northwest University Kano Post-UTME Economics test items. The developed fifty (50) economics test items was…

  15. Development and validation of the Single Item Narcissism Scale (SINS).

    Science.gov (United States)

    Konrath, Sara; Meier, Brian P; Bushman, Brad J

    2014-01-01

    The narcissistic personality is characterized by grandiosity, entitlement, and low empathy. This paper describes the development and validation of the Single Item Narcissism Scale (SINS). Although the use of longer instruments is superior in most circumstances, we recommend the SINS in some circumstances (e.g. under serious time constraints, online studies). In 11 independent studies (total N = 2,250), we demonstrate the SINS' psychometric properties. The SINS is significantly correlated with longer narcissism scales, but uncorrelated with self-esteem. It also has high test-retest reliability. We validate the SINS in a variety of samples (e.g., undergraduates, nationally representative adults), intrapersonal correlates (e.g., positive affect, depression), and interpersonal correlates (e.g., aggression, relationship quality, prosocial behavior). The SINS taps into the more fragile and less desirable components of narcissism. The SINS can be a useful tool for researchers, especially when it is important to measure narcissism with constraints preventing the use of longer measures.

  16. Australian Chemistry Test Item Bank: Years 11 & 12. Volume 1.

    Science.gov (United States)

    Commons, C., Ed.; Martin, P., Ed.

    Volume 1 of the Australian Chemistry Test Item Bank, consisting of two volumes, contains nearly 2000 multiple-choice items related to the chemistry taught in Year 11 and Year 12 courses in Australia. Items which were written during 1979 and 1980 were initially published in the "ACER Chemistry Test Item Collection" and in the "ACER…

  17. Computerized Adaptive Test (CAT) Applications and Item Response Theory Models for Polytomous Items

    Science.gov (United States)

    Aybek, Eren Can; Demirtasli, R. Nukhet

    2017-01-01

    This article aims to provide a theoretical framework for computerized adaptive tests (CAT) and item response theory models for polytomous items. Besides that, it aims to introduce the simulation and live CAT software to the related researchers. Computerized adaptive test algorithm, assumptions of item response theory models, nominal response…

  18. Single-Item Measurement of Suicidal Behaviors: Validity and Consequences of Misclassification.

    Directory of Open Access Journals (Sweden)

    Alexander J Millner

    Full Text Available Suicide is a leading cause of death worldwide. Although research has made strides in better defining suicidal behaviors, there has been less focus on accurate measurement. Currently, the widespread use of self-report, single-item questions to assess suicide ideation, plans and attempts may contribute to measurement problems and misclassification. We examined the validity of single-item measurement and the potential for statistical errors. Over 1,500 participants completed an online survey containing single-item questions regarding a history of suicidal behaviors, followed by questions with more precise language, multiple response options and narrative responses to examine the validity of single-item questions. We also conducted simulations to test whether common statistical tests are robust against the degree of misclassification produced by the use of single-items. We found that 11.3% of participants that endorsed a single-item suicide attempt measure engaged in behavior that would not meet the standard definition of a suicide attempt. Similarly, 8.8% of those who endorsed a single-item measure of suicide ideation endorsed thoughts that would not meet standard definitions of suicide ideation. Statistical simulations revealed that this level of misclassification substantially decreases statistical power and increases the likelihood of false conclusions from statistical tests. Providing a wider range of response options for each item reduced the misclassification rate by approximately half. Overall, the use of single-item, self-report questions to assess the presence of suicidal behaviors leads to misclassification, increasing the likelihood of statistical decision errors. Improving the measurement of suicidal behaviors is critical to increase understanding and prevention of suicide.

  19. Modeling Local Item Dependence in Cloze and Reading Comprehension Test Items Using Testlet Response Theory

    Science.gov (United States)

    Baghaei, Purya; Ravand, Hamdollah

    2016-01-01

    In this study the magnitudes of local dependence generated by cloze test items and reading comprehension items were compared and their impact on parameter estimates and test precision was investigated. An advanced English as a foreign language reading comprehension test containing three reading passages and a cloze test was analyzed with a…

  20. Guide to good practices for the development of test items

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-01-01

    While the methodology used in developing test items can vary significantly, to ensure quality examinations, test items should be developed systematically. Test design and development is discussed in the DOE Guide to Good Practices for Design, Development, and Implementation of Examinations. This guide is intended to be a supplement by providing more detailed guidance on the development of specific test items. This guide addresses the development of written examination test items primarily. However, many of the concepts also apply to oral examinations, both in the classroom and on the job. This guide is intended to be used as guidance for the classroom and laboratory instructor or curriculum developer responsible for the construction of individual test items. This document focuses on written test items, but includes information relative to open-reference (open book) examination test items, as well. These test items have been categorized as short-answer, multiple-choice, or essay. Each test item format is described, examples are provided, and a procedure for development is included. The appendices provide examples for writing test items, a test item development form, and examples of various test item formats.

  1. Item Response Theory Models for Performance Decline during Testing

    Science.gov (United States)

    Jin, Kuan-Yu; Wang, Wen-Chung

    2014-01-01

    Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…

  2. Science Library of Test Items. Volume Twenty-Two. A Collection of Multiple Choice Test Items Relating Mainly to Skills.

    Science.gov (United States)

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  3. Science Library of Test Items. Volume Eighteen. A Collection of Multiple Choice Test Items Relating Mainly to Chemistry.

    Science.gov (United States)

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  4. Science Library of Test Items. Volume Twenty. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 1.

    Science.gov (United States)

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  5. Science Library of Test Items. Volume Seventeen. A Collection of Multiple Choice Test Items Relating Mainly to Biology.

    Science.gov (United States)

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  6. Science Library of Test Items. Volume Nineteen. A Collection of Multiple Choice Test Items Relating Mainly to Geology.

    Science.gov (United States)

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  7. Electronics. Criterion-Referenced Test (CRT) Item Bank.

    Science.gov (United States)

    Davis, Diane, Ed.

    This document contains 519 criterion-referenced multiple choice and true or false test items for a course in electronics. The test item bank is designed to work with both the Vocational Instructional Management System (VIMS) and the Vocational Administrative Management System (VAMS) in Missouri. The items are grouped into 15 units covering the…

  8. Assessing difference between classical test theory and item ...

    African Journals Online (AJOL)

    Assessing difference between classical test theory and item response theory methods in scoring primary four multiple choice objective test items. ... All research participants were ranked on the CTT number correct scores and the corresponding IRT item pattern scores from their performance on the PRISMADAT. Wilcoxon ...

  9. ACER Chemistry Test Item Collection (ACER CHEMTIC Year 12 Supplement).

    Science.gov (United States)

    Australian Council for Educational Research, Hawthorn.

    This publication contains 317 multiple-choice chemistry test items related to topics covered in the Victorian (Australia) Year 12 chemistry course. It allows teachers access to a range of items suitable for diagnostic and achievement purposes, supplementing the ACER Chemistry Test Item Collection--Year 12 (CHEMTIC). The topics covered are: organic…

  10. Development and Validation of the Single Item Narcissism Scale (SINS)

    Science.gov (United States)

    Konrath, Sara; Meier, Brian P.; Bushman, Brad J.

    2014-01-01

    Main Objectives The narcissistic personality is characterized by grandiosity, entitlement, and low empathy. This paper describes the development and validation of the Single Item Narcissism Scale (SINS). Although the use of longer instruments is superior in most circumstances, we recommend the SINS in some circumstances (e.g. under serious time constraints, online studies). Methods In 11 independent studies (total N = 2,250), we demonstrate the SINS' psychometric properties. Results The SINS is significantly correlated with longer narcissism scales, but uncorrelated with self-esteem. It also has high test-retest reliability. We validate the SINS in a variety of samples (e.g., undergraduates, nationally representative adults), intrapersonal correlates (e.g., positive affect, depression), and interpersonal correlates (e.g., aggression, relationship quality, prosocial behavior). The SINS taps into the more fragile and less desirable components of narcissism. Significance The SINS can be a useful tool for researchers, especially when it is important to measure narcissism with constraints preventing the use of longer measures. PMID:25093508

  11. Development and validation of the Single Item Narcissism Scale (SINS.

    Directory of Open Access Journals (Sweden)

    Sara Konrath

    Full Text Available MAIN OBJECTIVES: The narcissistic personality is characterized by grandiosity, entitlement, and low empathy. This paper describes the development and validation of the Single Item Narcissism Scale (SINS. Although the use of longer instruments is superior in most circumstances, we recommend the SINS in some circumstances (e.g. under serious time constraints, online studies. METHODS: In 11 independent studies (total N = 2,250, we demonstrate the SINS' psychometric properties. RESULTS: The SINS is significantly correlated with longer narcissism scales, but uncorrelated with self-esteem. It also has high test-retest reliability. We validate the SINS in a variety of samples (e.g., undergraduates, nationally representative adults, intrapersonal correlates (e.g., positive affect, depression, and interpersonal correlates (e.g., aggression, relationship quality, prosocial behavior. The SINS taps into the more fragile and less desirable components of narcissism. SIGNIFICANCE: The SINS can be a useful tool for researchers, especially when it is important to measure narcissism with constraints preventing the use of longer measures.

  12. The Role of Item Feedback in Self-Adapted Testing.

    Science.gov (United States)

    Roos, Linda L.; And Others

    1997-01-01

    The importance of item feedback in self-adapted testing was studied by comparing feedback and no feedback conditions for computerized adaptive tests and self-adapted tests taken by 363 college students. Results indicate that item feedback is not necessary to realize score differences between self-adapted and computerized adaptive testing. (SLD)

  13. Effect of Differential Item Functioning on Test Equating

    Science.gov (United States)

    Kabasakal, Kübra Atalay; Kelecioglu, Hülya

    2015-01-01

    This study examines the effect of differential item functioning (DIF) items on test equating through multilevel item response models (MIRMs) and traditional IRMs. The performances of three different equating models were investigated under 24 different simulation conditions, and the variables whose effects were examined included sample size, test…

  14. Computerized adaptive testing item selection in computerized adaptive learning systems

    NARCIS (Netherlands)

    Eggen, Theodorus Johannes Hendrikus Maria; Eggen, T.J.H.M.; Veldkamp, B.P.

    2012-01-01

    Item selection methods traditionally developed for computerized adaptive testing (CAT) are explored for their usefulness in item-based computerized adaptive learning (CAL) systems. While in CAT Fisher information-based selection is optimal, for recovering learning populations in CAL systems item

  15. Criteria for eliminating items of a Test of Figural Analogies

    Directory of Open Access Journals (Sweden)

    Diego Blum

    2013-12-01

    Full Text Available This paper describes the steps taken to eliminate two of the items in a Test of Figural Analogies (TFA. The main guidelines of psychometric analysis concerning Classical Test Theory (CTT and Item Response Theory (IRT are explained. The item elimination process was based on both the study of the CTT difficulty and discrimination index, and the unidimensionality analysis. The a, b, and c parameters of the Three Parameter Logistic Model of IRT were also considered for this purpose, as well as the assessment of each item fitting this model. The unfavourable characteristics of a group of TFA items are detailed, and decisions leading to their possible elimination are discussed.

  16. Detection of differential item functioning using Lagrange multiplier tests

    NARCIS (Netherlands)

    Glas, Cornelis A.W.

    1996-01-01

    In this paper it is shown that differential item functioning can be evaluated using the Lagrange multiplier test or C. R. Rao's efficient score test. The test is presented in the framework of a number of item response theory (IRT) models such as the Rasch model, the one-parameter logistic model, the

  17. A person fit test for IRT models for polytomous items

    NARCIS (Netherlands)

    Glas, Cornelis A.W.; Dagohoy, A.V.

    2007-01-01

    A person fit test based on the Lagrange multiplier test is presented for three item response theory models for polytomous items: the generalized partial credit model, the sequential model, and the graded response model. The test can also be used in the framework of multidimensional ability

  18. Algorithms for computerized test construction using classical item parameters

    NARCIS (Netherlands)

    Adema, Jos J.; van der Linden, Willem J.

    1989-01-01

    Recently, linear programming models for test construction were developed. These models were based on the information function from item response theory. In this paper another approach is followed. Two 0-1 linear programming models for the construction of tests using classical item and test

  19. Procedures for Selecting Items for Computerized Adaptive Tests.

    Science.gov (United States)

    Kingsbury, G. Gage; Zara, Anthony R.

    1989-01-01

    Several classical approaches and alternative approaches to item selection for computerized adaptive testing (CAT) are reviewed and compared. The study also describes procedures for constrained CAT that may be added to classical item selection approaches to allow them to be used for applied testing. (TJH)

  20. Detecting Test Tampering Using Item Response Theory

    Science.gov (United States)

    Wollack, James A.; Cohen, Allan S.; Eckerly, Carol A.

    2015-01-01

    Test tampering, especially on tests for educational accountability, is an unfortunate reality, necessitating that the state (or its testing vendor) perform data forensic analyses, such as erasure analyses, to look for signs of possible malfeasance. Few statistical approaches exist for detecting fraudulent erasures, and those that do largely do not…

  1. Item selection and ability estimation adaptive testing

    NARCIS (Netherlands)

    Pashley, Peter J.; van der Linden, Wim J.; van der Linden, Willem J.; Glas, Cornelis A.W.; Glas, Cees A.W.

    2010-01-01

    The last century saw a tremendous progression in the refinement and use of standardized linear tests. The first administered College Board exam occurred in 1901 and the first Scholastic Assessment Test (SAT) was given in 1926. Since then, progressively more sophisticated standardized linear tests

  2. Quantitative penetration testing with item response theory

    NARCIS (Netherlands)

    Pieters, W.; Arnold, F.; Stoelinga, M.I.A.

    2013-01-01

    Existing penetration testing approaches assess the vulnerability of a system by determining whether certain attack paths are possible in practice. Therefore, penetration testing has thus far been used as a qualitative research method. To enable quantitative approaches to security risk management,

  3. Quantitative Penetration Testing with Item Response Theory

    NARCIS (Netherlands)

    Arnold, Florian; Pieters, Wolter; Stoelinga, Mariëlle Ida Antoinette

    2014-01-01

    Existing penetration testing approaches assess the vulnerability of a system by determining whether certain attack paths are possible in practice. Thus, penetration testing has so far been used as a qualitative research method. To enable quantitative approaches to security risk management, including

  4. Quantitative penetration testing with item response theory

    NARCIS (Netherlands)

    Arnold, Florian; Pieters, Wolter; Stoelinga, Mariëlle

    2013-01-01

    Existing penetration testing approaches assess the vulnerability of a system by determining whether certain attack paths are possible in practice. Thus, penetration testing has so far been used as a qualitative research method. To enable quantitative approaches to security risk management, including

  5. Group differences in the heritability of items and test scores

    NARCIS (Netherlands)

    Wicherts, J.M.; Johnson, W.

    2009-01-01

    It is important to understand potential sources of group differences in the heritability of intelligence test scores. On the basis of a basic item response model we argue that heritabilities which are based on dichotomous item scores normally do not generalize from one sample to the next. If groups

  6. Mathematical-programming approaches to test item pool design

    NARCIS (Netherlands)

    Veldkamp, Bernard P.; van der Linden, Willem J.; Ariel, A.

    2002-01-01

    This paper presents an approach to item pool design that has the potential to improve on the quality of current item pools in educational and psychological testing andhence to increase both measurement precision and validity. The approach consists of the application of mathematical programming

  7. Item Response Theory Modeling of the Philadelphia Naming Test

    Science.gov (United States)

    Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D.

    2015-01-01

    Purpose: In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating…

  8. Evaluating an Automated Number Series Item Generator Using Linear Logistic Test Models

    Directory of Open Access Journals (Sweden)

    Bao Sheng Loe

    2018-04-01

    Full Text Available This study investigates the item properties of a newly developed Automatic Number Series Item Generator (ANSIG. The foundation of the ANSIG is based on five hypothesised cognitive operators. Thirteen item models were developed using the numGen R package and eleven were evaluated in this study. The 16-item ICAR (International Cognitive Ability Resource1 short form ability test was used to evaluate construct validity. The Rasch Model and two Linear Logistic Test Model(s (LLTM were employed to estimate and predict the item parameters. Results indicate that a single factor determines the performance on tests composed of items generated by the ANSIG. Under the LLTM approach, all the cognitive operators were significant predictors of item difficulty. Moderate to high correlations were evident between the number series items and the ICAR test scores, with high correlation found for the ICAR Letter-Numeric-Series type items, suggesting adequate nomothetic span. Extended cognitive research is, nevertheless, essential for the automatic generation of an item pool with predictable psychometric properties.

  9. Item response times in computerized adaptive testing

    Directory of Open Access Journals (Sweden)

    Lutz F. Hornke

    2000-01-01

    Full Text Available Tiempos de respuesta al ítem en tests adaptativos informatizados. Los tests adaptativos informatizados (TAI proporcionan puntuaciones y a la vez tiempos de respuesta a los ítems. La investigación sobre el significado adicional que se puede obtener de la información contenida en los tiempos de respuesta es de especial interés. Se dispuso de los datos de 5912 jóvenes en un test adaptativo informatizado. Estudios anteriores indican mayores tiempos de respuesta cuando las respuestas son incorrectas. Este resultado fue replicado en este estudio más amplio. No obstante, los tiempos promedios de respuesta al ítem para las respuestas erróneas y correctas no muestran una interpretación diferencial de la obtenida con los niveles de rasgo, y tampoco correlacionan de manera diferente con unos cuantos tests de capacidad. Se discute si los tiempos de respuesta deben ser interpretados en la misma dimensión que mide el TAI o en otras dimensiones. Desde los primeros años 30 los tiempos de respuesta han sido considerados indicadores de rasgos de personalidad que deben ser diferenciados de los rasgos que miden las puntuaciones del test. Esta idea es discutida y se ofrecen argumentos a favor y en contra. Los acercamientos mas recientes basados en modelos también se muestran. Permanece abierta la pregunta de si se obtiene o no información diagnóstica adicional de un TAI que tenga una toma de datos detallada y programada.

  10. Item response theory analysis of the life orientation test-revised: age and gender differential item functioning analyses.

    Science.gov (United States)

    Steca, Patrizia; Monzani, Dario; Greco, Andrea; Chiesi, Francesca; Primi, Caterina

    2015-06-01

    This study is aimed at testing the measurement properties of the Life Orientation Test-Revised (LOT-R) for the assessment of dispositional optimism by employing item response theory (IRT) analyses. The LOT-R was administered to a large sample of 2,862 Italian adults. First, confirmatory factor analyses demonstrated the theoretical conceptualization of the construct measured by the LOT-R as a single bipolar dimension. Subsequently, IRT analyses for polytomous, ordered response category data were applied to investigate the items' properties. The equivalence of the items across gender and age was assessed by analyzing differential item functioning. Discrimination and severity parameters indicated that all items were able to distinguish people with different levels of optimism and adequately covered the spectrum of the latent trait. Additionally, the LOT-R appears to be gender invariant and, with minor exceptions, age invariant. Results provided evidence that the LOT-R is a reliable and valid measure of dispositional optimism. © The Author(s) 2014.

  11. A psychometric comparison of three scales and a single-item measure to assess sexual satisfaction.

    Science.gov (United States)

    Mark, Kristen P; Herbenick, Debby; Fortenberry, J Dennis; Sanders, Stephanie; Reece, Michael

    2014-01-01

    This study was designed to systematically compare and contrast the psychometric properties of three scales developed to measure sexual satisfaction and a single-item measure of sexual satisfaction. The Index of Sexual Satisfaction (ISS), Global Measure of Sexual Satisfaction (GMSEX), and the New Sexual Satisfaction Scale-Short (NSSS-S) were compared to one another and to a single-item measure of sexual satisfaction. Conceptualization of the constructs, distribution of scores, internal consistency, convergent validity, test-retest reliability, and factor structure were compared between the measures. A total of 211 men and 214 women completed the scales and a measure of relationship satisfaction, with 33% (n = 139) of the sample reassessed two months later. All scales demonstrated appropriate distribution of scores and adequate internal consistency. The GMSEX, NSSS-S, and the single-item measure demonstrated convergent validity. Test-retest reliability was demonstrated by the ISS, GMSEX, and NSSS-S, but not the single-item measure. Taken together, the GMSEX received the strongest psychometric support in this sample for a unidimensional measure of sexual satisfaction and the NSSS-S received the strongest psychometric support in this sample for a bidimensional measure of sexual satisfaction.

  12. Item response theory analysis of the mechanics baseline test

    Science.gov (United States)

    Cardamone, Caroline N.; Abbott, Jonathan E.; Rayyan, Saif; Seaton, Daniel T.; Pawl, Andrew; Pritchard, David E.

    2012-02-01

    Item response theory is useful in both the development and evaluation of assessments and in computing standardized measures of student performance. In item response theory, individual parameters (difficulty, discrimination) for each item or question are fit by item response models. These parameters provide a means for evaluating a test and offer a better measure of student skill than a raw test score, because each skill calculation considers not only the number of questions answered correctly, but the individual properties of all questions answered. Here, we present the results from an analysis of the Mechanics Baseline Test given at MIT during 2005-2010. Using the item parameters, we identify questions on the Mechanics Baseline Test that are not effective in discriminating between MIT students of different abilities. We show that a limited subset of the highest quality questions on the Mechanics Baseline Test returns accurate measures of student skill. We compare student skills as determined by item response theory to the more traditional measurement of the raw score and show that a comparable measure of learning gain can be computed.

  13. A Model-Free Diagnostic for Single-Peakedness of Item Responses Using Ordered Conditional Means

    Science.gov (United States)

    Polak, Marike; De Rooij, Mark; Heiser, Willem J.

    2012-01-01

    In this article we propose a model-free diagnostic for single-peakedness (unimodality) of item responses. Presuming a unidimensional unfolding scale and a given item ordering, we approximate item response functions of all items based on ordered conditional means (OCM). The proposed OCM methodology is based on Thurstone & Chave's (1929) "criterion…

  14. Detection of differential item functioning using Lagrange multiplier tests

    NARCIS (Netherlands)

    Glas, Cornelis A.W.

    1998-01-01

    Abstract: In the present paper it is shown that differential item functioning can be evaluated using the Lagrange multiplier test or Rao’s efficient score test. The test is presented in the framework of a number of IRT models such as the Rasch model, the OPLM, the 2-parameter logistic model, the

  15. Face validity of the single work ability item

    DEFF Research Database (Denmark)

    Gupta, Nidhi; Jensen, Bjørn Søvsø; Søgaard, Karen

    2014-01-01

    with a total of 5,810 h, including 2,640 working hours. RESULTS: A significant moderate correlation between work ability and %HRR was observed among males (R = -0.33, P = 0.005), but not among females (R = 0.11, P = 0.431). In a gender-stratified multi-adjusted logistic regression analysis, males with high...... %HRR were more likely to report a reduced work ability compared to males with low %HRR [OR = 4.75, 95% confidence interval (95% CI) = 1.31 to 17.25]. However, this association was not found among females (OR = 0.26, 95% CI 0.03 to 2.16), and a significant interaction between work ability, %HRR......PURPOSE: The purpose of this study was to investigate the face validity of the self-reported single item work ability with objectively measured heart rate reserve (%HRR) among blue-collar workers. METHODS: We utilized data from 127 blue-collar workers (Female = 53; Male = 74) aged 18-65 years from...

  16. Differential item functioning analysis of the Vanderbilt Expertise Test for cars.

    Science.gov (United States)

    Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W; Van Gulick, Ana Beth; Gauthier, Isabel

    2015-01-01

    The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge.

  17. Differential Weighting of Items to Improve University Admission Test Validity

    Directory of Open Access Journals (Sweden)

    Eduardo Backhoff Escudero

    2001-05-01

    Full Text Available This paper gives an evaluation of different ways to increase university admission test criterion-related validity, by differentially weighting test items. We compared four methods of weighting multiple-choice items of the Basic Skills and Knowledge Examination (EXHCOBA: (1 punishing incorrect responses by a constant factor, (2 weighting incorrect responses, considering the levels of error, (3 weighting correct responses, considering the item’s difficulty, based on the Classic Measurement Theory, and (4 weighting correct responses, considering the item’s difficulty, based on the Item Response Theory. Results show that none of these methods increased the instrument’s predictive validity, although they did improve its concurrent validity. It was concluded that it is appropriate to score the test by simply adding up correct responses.

  18. A review of the effects on IRT item parameter estimates with a focus on misbehaving common items in test equating

    Directory of Open Access Journals (Sweden)

    Michalis P Michaelides

    2010-10-01

    Full Text Available Many studies have investigated the topic of change or drift in item parameter estimates in the context of Item Response Theory. Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items.

  19. A Review of the Effects on IRT Item Parameter Estimates with a Focus on Misbehaving Common Items in Test Equating.

    Science.gov (United States)

    Michaelides, Michalis P

    2010-01-01

    Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items.

  20. Assessing Differential Item Functioning on the Test of Relational Reasoning

    Directory of Open Access Journals (Sweden)

    Denis Dumas

    2018-03-01

    Full Text Available The test of relational reasoning (TORR is designed to assess the ability to identify complex patterns within visuospatial stimuli. The TORR is designed for use in school and university settings, and therefore, its measurement invariance across diverse groups is critical. In this investigation, a large sample, representative of a major university on key demographic variables, was collected, and the resulting data were analyzed using a multi-group, multidimensional item-response theory model-comparison procedure. No significant differential item functioning was found on any of the TORR items across any of the demographic groups of interest. This finding is interpreted as evidence of the cultural fairness of the TORR, and potential test-development choices that may have contributed to that cultural fairness are discussed.

  1. IRT-Estimated Reliability for Tests Containing Mixed Item Formats

    Science.gov (United States)

    Shu, Lianghua; Schwarz, Richard D.

    2014-01-01

    As a global measure of precision, item response theory (IRT) estimated reliability is derived for four coefficients (Cronbach's a, Feldt-Raju, stratified a, and marginal reliability). Models with different underlying assumptions concerning test-part similarity are discussed. A detailed computational example is presented for the targeted…

  2. Advanced Marketing Core Curriculum. Test Items and Assessment Techniques.

    Science.gov (United States)

    Smith, Clifton L.; And Others

    This document contains duties and tasks, multiple-choice test items, and other assessment techniques for Missouri's advanced marketing core curriculum. The core curriculum begins with a list of 13 suggested textbook resources. Next, nine duties with their associated tasks are given. Under each task appears one or more citations to appropriate…

  3. The Single-Item Math Anxiety Scale: An Alternative Way of Measuring Mathematical Anxiety

    Science.gov (United States)

    Núñez-Peña, M. Isabel; Guilera, Georgina; Suárez-Pellicioni, Macarena

    2014-01-01

    This study examined whether the Single-Item Math Anxiety Scale (SIMA), based on the item suggested by Ashcraft, provided valid and reliable scores of mathematical anxiety. A large sample of university students (n = 279) was administered the SIMA and the 25-item Shortened Math Anxiety Rating Scale (sMARS) to evaluate the relation between the scores…

  4. Bayes Factor Covariance Testing in Item Response Models.

    Science.gov (United States)

    Fox, Jean-Paul; Mulder, Joris; Sinharay, Sandip

    2017-12-01

    Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning the underlying covariance structure are evaluated using (fractional) Bayes factor tests. The support for a unidimensional factor (i.e., assumption of local independence) and differential item functioning are evaluated by testing the covariance components. The posterior distribution of common covariance components is obtained in closed form by transforming latent responses with an orthogonal (Helmert) matrix. This posterior distribution is defined as a shifted-inverse-gamma, thereby introducing a default prior and a balanced prior distribution. Based on that, an MCMC algorithm is described to estimate all model parameters and to compute (fractional) Bayes factor tests. Simulation studies are used to show that the (fractional) Bayes factor tests have good properties for testing the underlying covariance structure of binary response data. The method is illustrated with two real data studies.

  5. Applying modern psychometric techniques to melodic discrimination testing: Item response theory, computerised adaptive testing, and automatic item generation.

    Science.gov (United States)

    Harrison, Peter M C; Collins, Tom; Müllensiefen, Daniel

    2017-06-15

    Modern psychometric theory provides many useful tools for ability testing, such as item response theory, computerised adaptive testing, and automatic item generation. However, these techniques have yet to be integrated into mainstream psychological practice. This is unfortunate, because modern psychometric techniques can bring many benefits, including sophisticated reliability measures, improved construct validity, avoidance of exposure effects, and improved efficiency. In the present research we therefore use these techniques to develop a new test of a well-studied psychological capacity: melodic discrimination, the ability to detect differences between melodies. We calibrate and validate this test in a series of studies. Studies 1 and 2 respectively calibrate and validate an initial test version, while Studies 3 and 4 calibrate and validate an updated test version incorporating additional easy items. The results support the new test's viability, with evidence for strong reliability and construct validity. We discuss how these modern psychometric techniques may also be profitably applied to other areas of music psychology and psychological science in general.

  6. Work ability as prognostic risk marker of disability pension : Single-item work ability score versus multi-item work ability index

    NARCIS (Netherlands)

    Roelen, C.A.M.; Rhenen, van W.; Groothoff, J.W.; Klink, van der J.J.L.; Twisk, W.R.; Heymans, M.W.

    2014-01-01

    Work ability predicts future disability pension (DP). A single-item work ability score (WAS) is emerging as a measure for work ability. This study compared single-item WAS with the multi-item work ability index (WAI) in its ability to identify workers at risk of DP.

  7. Work ability as prognostic risk marker of disability pension: single-item work ability score versus multi-item work ability index

    NARCIS (Netherlands)

    Roelen, C.A.M.; van Rhenen, W.; Groothoff, J.W.; van der Klink, J.J.L.; Twisk, J.W.R.; Heymans, M.W.

    2014-01-01

    Objectives Work ability predicts future disability pension (DP). A single-item work ability score (WAS) is emerging as a measure for work ability. This study compared single-item WAS with the multi-item work ability index (WAI) in its ability to identify workers at risk of DP. Methods This

  8. Work ability as prognostic risk marker of disability pension : single-item work ability score versus multi-item work ability index

    NARCIS (Netherlands)

    Roelen, Corne A. M.; van Rhenen, Willem; Groothoff, Johan W.; van der Klink, Jac J. L.; Twisk, Jos W. R.; Heymans, Martijn W.

    Objectives Work ability predicts future disability pension (DP). A single-item work ability score (WAS) is emerging as a measure for work ability. This study compared single-item WAS with the multi-item work ability index (WAI) in its ability to identify workers at risk of DP. Methods This

  9. The Technical Quality of Test Items Generated Using a Systematic Approach to Item Writing.

    Science.gov (United States)

    Siskind, Theresa G.; Anderson, Lorin W.

    The study was designed to examine the similarity of response options generated by different item writers using a systematic approach to item writing. The similarity of response options to student responses for the same item stems presented in an open-ended format was also examined. A non-systematic (subject matter expertise) approach and a…

  10. Science Library of Test Items. Volume Twenty-One. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 2.

    Science.gov (United States)

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  11. Assessing the Validity of Single-item Life Satisfaction Measures: Results from Three Large Samples

    Science.gov (United States)

    Cheung, Felix; Lucas, Richard E.

    2014-01-01

    Purpose The present paper assessed the validity of single-item life satisfaction measures by comparing single-item measures to the Satisfaction with Life Scale (SWLS) - a more psychometrically established measure. Methods Two large samples from Washington (N=13,064) and Oregon (N=2,277) recruited by the Behavioral Risk Factor Surveillance System (BRFSS) and a representative German sample (N=1,312) recruited by the Germany Socio-Economic Panel (GSOEP) were included in the present analyses. Single-item life satisfaction measures and the SWLS were correlated with theoretically relevant variables, such as demographics, subjective health, domain satisfaction, and affect. The correlations between the two life satisfaction measures and these variables were examined to assess the construct validity of single-item life satisfaction measures. Results Consistent across three samples, single-item life satisfaction measures demonstrated substantial degree of criterion validity with the SWLS (zero-order r = 0.62 – 0.64; disattenuated r = 0.78 – 0.80). Patterns of statistical significance for correlations with theoretically relevant variables were the same across single-item measures and the SWLS. Single-item measures did not produce systematically different correlations compared to the SWLS (average difference = 0.001 – 0.005). The average absolute difference in the magnitudes of the correlations produced by single-item measures and the SWLS were very small (average absolute difference = 0.015 −0.042). Conclusions Single-item life satisfaction measures performed very similarly compared to the multiple-item SWLS. Social scientists would get virtually identical answer to substantive questions regardless of which measure they use. PMID:24890827

  12. Assessing the validity of single-item life satisfaction measures: results from three large samples.

    Science.gov (United States)

    Cheung, Felix; Lucas, Richard E

    2014-12-01

    The present paper assessed the validity of single-item life satisfaction measures by comparing single-item measures to the Satisfaction with Life Scale (SWLS)-a more psychometrically established measure. Two large samples from Washington (N = 13,064) and Oregon (N = 2,277) recruited by the Behavioral Risk Factor Surveillance System and a representative German sample (N = 1,312) recruited by the Germany Socio-Economic Panel were included in the present analyses. Single-item life satisfaction measures and the SWLS were correlated with theoretically relevant variables, such as demographics, subjective health, domain satisfaction, and affect. The correlations between the two life satisfaction measures and these variables were examined to assess the construct validity of single-item life satisfaction measures. Consistent across three samples, single-item life satisfaction measures demonstrated substantial degree of criterion validity with the SWLS (zero-order r = 0.62-0.64; disattenuated r = 0.78-0.80). Patterns of statistical significance for correlations with theoretically relevant variables were the same across single-item measures and the SWLS. Single-item measures did not produce systematically different correlations compared to the SWLS (average difference = 0.001-0.005). The average absolute difference in the magnitudes of the correlations produced by single-item measures and the SWLS was very small (average absolute difference = 0.015-0.042). Single-item life satisfaction measures performed very similarly compared to the multiple-item SWLS. Social scientists would get virtually identical answer to substantive questions regardless of which measure they use.

  13. Development of a lack of appetite item bank for computer-adaptive testing (CAT)

    DEFF Research Database (Denmark)

    Thamsborg, Lise Laurberg Holst; Petersen, Morten Aa; Aaronson, Neil K

    2015-01-01

    to 12 lack of appetite items. CONCLUSIONS: Phases 1-3 resulted in 12 lack of appetite candidate items. Based on a field testing (phase 4), the psychometric characteristics of the items will be assessed and the final item bank will be generated. This CAT item bank is expected to provide precise...

  14. Work ability as prognostic risk marker of disability pension: single-item work ability score versus multi-item work ability index.

    Science.gov (United States)

    Roelen, Corné A M; van Rhenen, Willem; Groothoff, Johan W; van der Klink, Jac J L; Twisk, Jos W R; Heymans, Martijn W

    2014-07-01

    Work ability predicts future disability pension (DP). A single-item work ability score (WAS) is emerging as a measure for work ability. This study compared single-item WAS with the multi-item work ability index (WAI) in its ability to identify workers at risk of DP. This prospective cohort study comprised 11 537 male construction workers, who completed the WAI at baseline and reported DP after a mean 2.3 years of follow-up. WAS and WAI were calibrated for DP risk predictions with the Hosmer-Lemeshow (H-L) test and their ability to discriminate between high- and low-risk construction workers was investigated with the area under the receiver operating characteristic curve (AUC). At follow-up, 336 (3%) construction workers reported DP. Both WAS [odds ratio (OR) 0.72, 95% confidence interval (95% CI) 0.66-0.78] and WAI (OR 0.57, 95% CI 0.52-0.63) scores were associated with DP at follow-up. The WAS showed miscalibration (H-L model χ (�)=10.60; df=3; P=0.01) and poorly discriminated between high- and low-risk construction workers (AUC 0.67, 95% CI 0.64-0.70). In contrast, calibration (H-L model χ �=8.20; df=8; P=0.41) and discrimination (AUC 0.78, 95% CI 0.75-0.80) were both adequate for the WAI. Although associated with the risk of future DP, the single-item WAS poorly identified male construction workers at risk of DP. We recommend using the multi-item WAI to screen for risk of DP in occupational health practice.

  15. An emotional functioning item bank of 24 items for computerized adaptive testing (CAT) was established

    DEFF Research Database (Denmark)

    Petersen, Morten Aa.; Gamper, Eva-Maria; Costantini, Anna

    2016-01-01

    of the widely used EORTC Quality of Life questionnaire (QLQ-C30). STUDY DESIGN AND SETTING: On the basis of literature search and evaluations by international samples of experts and cancer patients, 38 candidate items were developed. The psychometric properties of the items were evaluated in a large...... international sample of cancer patients. This included evaluations of dimensionality, item response theory (IRT) model fit, differential item functioning (DIF), and of measurement precision/statistical power. RESULTS: Responses were obtained from 1,023 cancer patients from four countries. The evaluations showed...... that 24 items could be included in a unidimensional IRT model. DIF did not seem to have any significant impact on the estimation of EF. Evaluations indicated that the CAT measure may reduce sample size requirements by up to 50% compared to the QLQ-C30 EF scale without reducing power. CONCLUSION...

  16. Effects of Differential Item Functioning on Examinees' Test Performance and Reliability of Test

    Science.gov (United States)

    Lee, Yi-Hsuan; Zhang, Jinming

    2017-01-01

    Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…

  17. Stochastic order in dichotomous item response models for fixed tests, research adaptive tests, or multiple abilities

    NARCIS (Netherlands)

    van der Linden, Willem J.

    1995-01-01

    Dichotomous item response theory (IRT) models can be viewed as families of stochastically ordered distributions of responses to test items. This paper explores several properties of such distributiom. The focus is on the conditions under which stochastic order in families of conditional

  18. Evaluating the Psychometric Characteristics of Generated Multiple-Choice Test Items

    Science.gov (United States)

    Gierl, Mark J.; Lai, Hollis; Pugh, Debra; Touchie, Claire; Boulais, André-Philippe; De Champlain, André

    2016-01-01

    Item development is a time- and resource-intensive process. Automatic item generation integrates cognitive modeling with computer technology to systematically generate test items. To date, however, items generated using cognitive modeling procedures have received limited use in operational testing situations. As a result, the psychometric…

  19. An Effect Size Measure for Raju's Differential Functioning for Items and Tests

    Science.gov (United States)

    Wright, Keith D.; Oshima, T. C.

    2015-01-01

    This study established an effect size measure for differential functioning for items and tests' noncompensatory differential item functioning (NCDIF). The Mantel-Haenszel parameter served as the benchmark for developing NCDIF's effect size measure for reporting moderate and large differential item functioning in test items. The effect size of…

  20. Australian Chemistry Test Item Bank: Years 11 and 12. Volume 2.

    Science.gov (United States)

    Commons, C., Ed.; Martin, P., Ed.

    The second volume of the Australian Chemistry Test Item Bank, consisting of two volumes, contains nearly 2000 multiple-choice items related to the chemistry taught in Year 11 and Year 12 courses in Australia. Items which were written during 1979 and 1980 were initially published in the "ACER Chemistry Test Item Collection" and in the…

  1. Mixed-Format Test Score Equating: Effect of Item-Type Multidimensionality, Length and Composition of Common-Item Set, and Group Ability Difference

    Science.gov (United States)

    Wang, Wei

    2013-01-01

    Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are now widely used in many testing programs. Mixed-format tests often are considered to be superior to tests containing only MC items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under…

  2. A simple and fast item selection procedure for adaptive testing

    NARCIS (Netherlands)

    Veerkamp, W.J.J.; Veerkamp, Wim J.J.; Berger, Martijn; Berger, Martijn P.F.

    1994-01-01

    Items with the highest discrimination parameter values in a logistic item response theory (IRT) model do not necessarily give maximum information. This paper shows which discrimination parameter values (as a function of the guessing parameter and the distance between person ability and item

  3. Cross-National Prevalence of Traditional Bullying, Traditional Victimization, Cyberbullying and Cyber-Victimization: Comparing Single-Item and Multiple-Item Approaches of Measurement

    Science.gov (United States)

    Yanagida, Takuya; Gradinger, Petra; Strohmeier, Dagmar; Solomontos-Kountouri, Olga; Trip, Simona; Bora, Carmen

    2016-01-01

    Many large-scale cross-national studies rely on a single-item measurement when comparing prevalence rates of traditional bullying, traditional victimization, cyberbullying, and cyber-victimization between countries. However, the reliability and validity of single-item measurement approaches are highly problematic and might be biased. Data from…

  4. Item difficulty of multiple choice tests dependant on different item response formats – An experiment in fundamental research on psychological assessment

    Directory of Open Access Journals (Sweden)

    KLAUS D. KUBINGER

    2007-12-01

    Full Text Available Multiple choice response formats are problematical as an item is often scored as solved simply because the test-taker is a lucky guesser. Instead of applying pertinent IRT models which take guessing effects into account, a pragmatic approach of re-conceptualizing multiple choice response formats to reduce the chance of lucky guessing is considered. This paper compares the free response format with two different multiple choice formats. A common multiple choice format with a single correct response option and five distractors (“1 of 6” is used, as well as a multiple choice format with five response options, of which any number of the five is correct and the item is only scored as mastered if all the correct response options and none of the wrong ones are marked (“x of 5”. An experiment was designed, using pairs of items with exactly the same content but different response formats. 173 test-takers were randomly assigned to two test booklets of 150 items altogether. Rasch model analyses adduced a fitting item pool, after the deletion of 39 items. The resulting item difficulty parameters were used for the comparison of the different formats. The multiple choice format “1 of 6” differs significantly from “x of 5”, with a relative effect of 1.63, while the multiple choice format “x of 5” does not significantly differ from the free response format. Therefore, the lower degree of difficulty of items with the “1 of 6” multiple choice format is an indicator of relevant guessing effects. In contrast the “x of 5” multiple choice format can be seen as an appropriate substitute for free response format.

  5. Detection of person misfit in computerized adaptive tests with polytomous items

    NARCIS (Netherlands)

    van Krimpen-Stoop, Edith; Meijer, R.R.

    2000-01-01

    Item scores that do not fit an assumed item response theory model may cause the latent trait value to be estimated inaccurately. For computerized adaptive tests (CAT) with dichotomous items, several person-fit statistics for detecting nonfitting item score patterns have been proposed. Both for

  6. Uncertainties in the Item Parameter Estimates and Robust Automated Test Assembly

    Science.gov (United States)

    Veldkamp, Bernard P.; Matteucci, Mariagiulia; de Jong, Martijn G.

    2013-01-01

    Item response theory parameters have to be estimated, and because of the estimation process, they do have uncertainty in them. In most large-scale testing programs, the parameters are stored in item banks, and automated test assembly algorithms are applied to assemble operational test forms. These algorithms treat item parameters as fixed values,…

  7. Developing a Model for Optimizing Inventory of Repairable Items at Single Operating Base

    OpenAIRE

    Le, Tin

    2016-01-01

    The use of EOQ model in inventory management is popular. However, EOQ models has many disadvantages, especially, when the model is applied to manage repairable items. In order to deal with high-cost and repairable items, Craig C. Sherbrooke introduced a model in his book “Optimal Inventory Modeling of Systems: Multi-Echelon Techniques”. The research focus is to implement and develop a program to execute the single-site in-ventory model for repairable items. The model helps to significantl...

  8. Bayes factor covariance testing in item response models

    NARCIS (Netherlands)

    Fox, J.P.; Mulder, J.; Sinharay, Sandip

    2017-01-01

    Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning

  9. Bayes Factor Covariance Testing in Item Response Models

    NARCIS (Netherlands)

    Fox, Jean-Paul; Mulder, Joris; Sinharay, Sandip

    2017-01-01

    Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning

  10. Single-item measure for assessing quality of life in children with drug-resistant epilepsy.

    Science.gov (United States)

    Conway, Lauryn; Widjaja, Elysa; Smith, Mary Lou

    2018-03-01

    The current study investigated the psychometric properties of a single-item quality of life (QOL) measure, the Global Quality of Life in Childhood Epilepsy question (G-QOLCE), in children with drug-resistant epilepsy. Data came from the Impact of Pediatric Epilepsy Surgery on Health-Related Quality of Life Study (PESQOL), a multicenter prospective cohort study (n = 118) with observations collected at baseline and at 6 months of follow-up on children aged 4-18 years. QOL was measured with the QOLCE-76 and KIDSCREEN-27. The G-QOLCE was an overall QOL question derived from the QOLCE-76. Construct validity and reliability were assessed with Spearman's correlation and intraclass correlation coefficient (ICC). Responsiveness was examined through distribution-based and anchor-based methods. The G-QOLCE showed moderate (r ≥ 0.30) to strong (r ≥ 0.50) correlations with composite scores, and most subscales of the QOLCE-76 and KIDSCREEN-27 at baseline and 6-month follow-up. The G-QOLCE had moderate test-retest reliability (ICC range: 0.49-0.72) and was able to detect clinically important change in patients' QOL (standardized response mean: 0.38; probability of change: 0.65; Guyatt's responsiveness statistics: 0.62 and 0.78). Caregiver anxiety and family functioning contributed most strongly to G-QOLCE scores over time. Results offer promising preliminary evidence regarding the validity, reliability, and responsiveness of the proposed single-item QOL measure. The G-QOLCE is a potentially useful tool that can be feasibly administered in a busy clinical setting to evaluate clinical status and impact of treatment outcomes in pediatric epilepsy.

  11. Creating a Database for Test Items in National Examinations (pp ...

    African Journals Online (AJOL)

    Nekky Umera

    different programmers create files and application programs over a long period. .... In theory or essay questions, alternative methods of solving problems are explored and ... Unworthy items are those that do not focus on the central concept or.

  12. Working memory for sequences of temporal durations reveals a volatile single-item store

    Directory of Open Access Journals (Sweden)

    Sanjay G Manohar

    2016-10-01

    Full Text Available When a sequence is held in working memory, different items are retained with differing fidelity. Here we ask whether a sequence of brief time intervals that must be remembered show recency effects, similar to those observed in verbal and visuospatial working memory. It has been suggested that prioritising some items over others can be accounted for by a focus of attention, maintaining some items in a privileged state. We therefore also investigated whether such benefits are vulnerable to disruption by attention or expectation. Participants listened to sequences of one to five tones, of varying durations (200ms to 2s. Subsequently, the length of one of the tones in the sequence had to be reproduced by holding a key. The discrepancy between the reproduced and actual durations quantified the fidelity of memory for auditory durations. Recall precision decreased with the number of items that had to be remembered, and was better for the first and last items of sequences, in line with set-size and serial position effects seen in other modalities. To test whether attentional filtering demands might impair performance, an irrelevant variation in pitch was introduced in some blocks of trials. In those blocks, memory precision was worse for sequences that consisted of only one item, i.e. the smallest memory set size. Thus, when irrelevant information was present, the benefit of having only one item in memory is attenuated. Finally we examined whether expectation could interfere with memory. On half the trials, the number of items in the upcoming sequence was cued. When the number of items was known in advance, performance was paradoxically worse when the sequence consisted of only one item. Thus the benefit of having only one item to remember is stronger when it is unexpectedly the only item. Our results suggest that similar mechanisms are used to hold auditory time durations in working memory, as for visual or verbal stimuli. Further, solitary items were

  13. A more general model for testing measurement invariance and differential item functioning.

    Science.gov (United States)

    Bauer, Daniel J

    2017-09-01

    The evaluation of measurement invariance is an important step in establishing the validity and comparability of measurements across individuals. Most commonly, measurement invariance has been examined using 1 of 2 primary latent variable modeling approaches: the multiple groups model or the multiple-indicator multiple-cause (MIMIC) model. Both approaches offer opportunities to detect differential item functioning within multi-item scales, and thereby to test measurement invariance, but both approaches also have significant limitations. The multiple groups model allows 1 to examine the invariance of all model parameters but only across levels of a single categorical individual difference variable (e.g., ethnicity). In contrast, the MIMIC model permits both categorical and continuous individual difference variables (e.g., sex and age) but permits only a subset of the model parameters to vary as a function of these characteristics. The current article argues that moderated nonlinear factor analysis (MNLFA) constitutes an alternative, more flexible model for evaluating measurement invariance and differential item functioning. We show that the MNLFA subsumes and combines the strengths of the multiple group and MIMIC models, allowing for a full and simultaneous assessment of measurement invariance and differential item functioning across multiple categorical and/or continuous individual difference variables. The relationships between the MNLFA model and the multiple groups and MIMIC models are shown mathematically and via an empirical demonstration. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  14. International Semiotics: Item Difficulty and the Complexity of Science Item Illustrations in the PISA-2009 International Test Comparison

    Science.gov (United States)

    Solano-Flores, Guillermo; Wang, Chao; Shade, Chelsey

    2016-01-01

    We examined multimodality (the representation of information in multiple semiotic modes) in the context of international test comparisons. Using Program of International Student Assessment (PISA)-2009 data, we examined the correlation of the difficulty of science items and the complexity of their illustrations. We observed statistically…

  15. The "Sniffin' Kids" test--a 14-item odor identification test for children.

    Directory of Open Access Journals (Sweden)

    Valentin A Schriever

    Full Text Available Tools for measuring olfactory function in adults have been well established. Although studies have shown that olfactory impairment in children may occur as a consequence of a number of diseases or head trauma, until today no consensus on how to evaluate the sense of smell in children exists in Europe. Aim of the study was to develop a modified "Sniffin' Sticks" odor identification test, the "Sniffin' Kids" test for the use in children. In this study 537 children between 6-17 years of age were included. Fourteen odors, which were identified at a high rate by children, were selected from the "Sniffin' Sticks" 16-item odor identification test. Normative date for the 14-item "Sniffin' Kids" odor identification test was obtained. The test was validated by including a group of congenital anosmic children. Results show that the "Sniffin' Kids" test is able to discriminate between normosmia and anosmia with a cutoff value of >7 points on the odor identification test. In addition the test-retest reliability was investigated in a group of 31 healthy children and shown to be ρ = 0.44. With the 14-item odor identification "Sniffin' Kids" test we present a valid and reliable test for measuring olfactory function in children between ages 6-17 years.

  16. The utility of single-item readiness screeners in middle school.

    Science.gov (United States)

    Lewis, Crystal G; Herman, Keith C; Huang, Francis L; Stormont, Melissa; Grossman, Caroline; Eddy, Colleen; Reinke, Wendy M

    2017-10-01

    This study examined the benefit of utilizing one-item academic and one-item behavior readiness teacher-rated screeners at the beginning of the school year to predict end-of-school year outcomes for middle school students. The Middle School Academic and Behavior Readiness (M-ABR) screeners were developed to provide an efficient and effective way to assess readiness in students. Participants included 889 students in 62 middle school classrooms in an urban Missouri school district. Concurrent validity with the M-ABR items and other indicators of readiness in the fall were evaluated using Pearson product-moment correlation coefficients, with the academic readiness item having medium to strong correlations with other baseline academic indicators (r=±0.56 to 0.91) and the behavior readiness item having low to strong correlations with baseline behavior items (r=±0.20 to 0.79). Next, the predictive validity of the M-ABR items was analyzed with hierarchical linear regressions using end-of-year outcomes as the dependent variable. The academic and behavior readiness items demonstrated adequate validity for all outcomes with moderate effects (β=±0.31 to 0.73 for academic outcomes and β=±0.24 to 0.59 for behavioral outcomes) after controlling for baseline demographics. Even after controlling for baseline scores, the M-ABR items predicted unique variance in almost all outcome variables. Four conditional probability indices were calculated to obtain an optimal cut score, to determine ready vs. not ready, for both single-item M-ABR scales. The cut point of "fair" yielded the most acceptable values for the indices. The odd ratios (OR) of experiencing negative outcomes given a "fair" or lower readiness rating (2 or below on the M-ABR screeners) at the beginning of the year were significant and strong for all outcomes (OR=2.29 to OR=14.46), except for internalizing problems. These findings suggest promise for using single readiness items to screen for varying negative end

  17. Single-item screening for agoraphobic symptoms : validation of a web-based audiovisual screening instrument

    NARCIS (Netherlands)

    van Ballegooijen, Wouter; Riper, Heleen; Donker, Tara; Martin Abello, Katherina; Marks, Isaac; Cuijpers, Pim

    2012-01-01

    The advent of web-based treatments for anxiety disorders creates a need for quick and valid online screening instruments, suitable for a range of social groups. This study validates a single-item multimedia screening instrument for agoraphobia, part of the Visual Screener for Common Mental Disorders

  18. The Effect of Error in Item Parameter Estimates on the Test Response Function Method of Linking.

    Science.gov (United States)

    Kaskowitz, Gary S.; De Ayala, R. J.

    2001-01-01

    Studied the effect of item parameter estimation for computation of linking coefficients for the test response function (TRF) linking/equating method. Simulation results showed that linking was more accurate when there was less error in the parameter estimates, and that 15 or 25 common items provided better results than 5 common items under both…

  19. Statistical Indexes for Monitoring Item Behavior under Computer Adaptive Testing Environment.

    Science.gov (United States)

    Zhu, Renbang; Yu, Feng; Liu, Su

    A computerized adaptive test (CAT) administration usually requires a large supply of items with accurately estimated psychometric properties, such as item response theory (IRT) parameter estimates, to ensure the precision of examinee ability estimation. However, an estimated IRT model of a given item in any given pool does not always correctly…

  20. Development of Test Items Related to Selected Concepts Within the Scheme the Particle Nature of Matter.

    Science.gov (United States)

    Doran, Rodney L.; Pella, Milton O.

    The purpose of this study was to develop tests items with a minimum reading demand for use with pupils at grade levels two through six. An item was judged to be acceptable if the item satisfied at least four of six criteria. Approximately 250 students in grades 2-6 participated in the study. Half of the students were given instruction to develop…

  1. SINGLE HEATER TEST FINAL REPORT

    Energy Technology Data Exchange (ETDEWEB)

    J.B. Cho

    1999-05-01

    The Single Heater Test is the first of the in-situ thermal tests conducted by the U.S. Department of Energy as part of its program of characterizing Yucca Mountain in Nevada as the potential site for a proposed deep geologic repository for the disposal of spent nuclear fuel and high-level nuclear waste. The Site Characterization Plan (DOE 1988) contained an extensive plan of in-situ thermal tests aimed at understanding specific aspects of the response of the local rock-mass around the potential repository to the heat from the radioactive decay of the emplaced waste. With the refocusing of the Site Characterization Plan by the ''Civilian Radioactive Waste Management Program Plan'' (DOE 1994), a consolidated thermal testing program emerged by 1995 as documented in the reports ''In-Situ Thermal Testing Program Strategy'' (DOE 1995) and ''Updated In-Situ Thermal Testing Program Strategy'' (CRWMS M&O 1997a). The concept of the Single Heater Test took shape in the summer of 1995 and detailed planning and design of the test started with the beginning fiscal year 1996. The overall objective of the Single Heater Test was to gain an understanding of the coupled thermal, mechanical, hydrological, and chemical processes that are anticipated to occur in the local rock-mass in the potential repository as a result of heat from radioactive decay of the emplaced waste. This included making a priori predictions of the test results using existing models and subsequently refining or modifying the models, on the basis of comparative and interpretive analyses of the measurements and predictions. A second, no less important, objective was to try out, in a full-scale field setting, the various instruments and equipment to be employed in the future on a much larger, more complex, thermal test of longer duration, such as the Drift Scale Test. This ''shake down'' or trial aspect of the Single Heater Test applied

  2. SINGLE HEATER TEST FINAL REPORT

    International Nuclear Information System (INIS)

    J.B. Cho

    1999-01-01

    The Single Heater Test is the first of the in-situ thermal tests conducted by the U.S. Department of Energy as part of its program of characterizing Yucca Mountain in Nevada as the potential site for a proposed deep geologic repository for the disposal of spent nuclear fuel and high-level nuclear waste. The Site Characterization Plan (DOE 1988) contained an extensive plan of in-situ thermal tests aimed at understanding specific aspects of the response of the local rock-mass around the potential repository to the heat from the radioactive decay of the emplaced waste. With the refocusing of the Site Characterization Plan by the ''Civilian Radioactive Waste Management Program Plan'' (DOE 1994), a consolidated thermal testing program emerged by 1995 as documented in the reports ''In-Situ Thermal Testing Program Strategy'' (DOE 1995) and ''Updated In-Situ Thermal Testing Program Strategy'' (CRWMS M and O 1997a). The concept of the Single Heater Test took shape in the summer of 1995 and detailed planning and design of the test started with the beginning fiscal year 1996. The overall objective of the Single Heater Test was to gain an understanding of the coupled thermal, mechanical, hydrological, and chemical processes that are anticipated to occur in the local rock-mass in the potential repository as a result of heat from radioactive decay of the emplaced waste. This included making a priori predictions of the test results using existing models and subsequently refining or modifying the models, on the basis of comparative and interpretive analyses of the measurements and predictions. A second, no less important, objective was to try out, in a full-scale field setting, the various instruments and equipment to be employed in the future on a much larger, more complex, thermal test of longer duration, such as the Drift Scale Test. This ''shake down'' or trial aspect of the Single Heater Test applied not just to the hardware, but also to the teamwork and cooperation between

  3. Projective Item Response Model for Test-Independent Measurement

    Science.gov (United States)

    Ip, Edward Hak-Sing; Chen, Shyh-Huei

    2012-01-01

    The problem of fitting unidimensional item-response models to potentially multidimensional data has been extensively studied. The focus of this article is on response data that contains a major dimension of interest but that may also contain minor nuisance dimensions. Because fitting a unidimensional model to multidimensional data results in…

  4. An empirical comparison of Item Response Theory and Classical Test Theory

    Directory of Open Access Journals (Sweden)

    Špela Progar

    2008-11-01

    Full Text Available Based on nonlinear models between the measured latent variable and the item response, item response theory (IRT enables independent estimation of item and person parameters and local estimation of measurement error. These properties of IRT are also the main theoretical advantages of IRT over classical test theory (CTT. Empirical evidence, however, often failed to discover consistent differences between IRT and CTT parameters and between invariance measures of CTT and IRT parameter estimates. In this empirical study a real data set from the Third International Mathematics and Science Study (TIMSS 1995 was used to address the following questions: (1 How comparable are CTT and IRT based item and person parameters? (2 How invariant are CTT and IRT based item parameters across different participant groups? (3 How invariant are CTT and IRT based item and person parameters across different item sets? The findings indicate that the CTT and the IRT item/person parameters are very comparable, that the CTT and the IRT item parameters show similar invariance property when estimated across different groups of participants, that the IRT person parameters are more invariant across different item sets, and that the CTT item parameters are at least as much invariant in different item sets as the IRT item parameters. The results furthermore demonstrate that, with regards to the invariance property, IRT item/person parameters are in general empirically superior to CTT parameters, but only if the appropriate IRT model is used for modelling the data.

  5. Modeling differential item functioning with group-specific item parameters: A computerized adaptive testing application

    NARCIS (Netherlands)

    Makransky, Guido; Glas, Cornelis A.W.

    2013-01-01

    Many important decisions are made based on the results of tests administered under different conditions in the fields of educational and psychological testing. Inaccurate inferences are often made if the property of measurement invariance (MI) is not assessed across these conditions. The importance

  6. Single event upset test programs

    International Nuclear Information System (INIS)

    Russen, L.C.

    1984-11-01

    It has been shown that the heavy ions in cosmic rays can give rise to single event upsets in VLSI random access memory devices (RAMs). Details are given of the programs written to test 1K, 4K, 16K and 64K memories during their irradiation with heavy charged ions, in order to simulate the effects of cosmic rays in space. The test equipment, which is used to load the memory device to be tested with a known bit pattern, and subsequently interrogate it for upsets, or ''flips'', is fully described. (author)

  7. Psychometric properties of a single-item scale to assess sleep quality among individuals with fibromyalgia

    Directory of Open Access Journals (Sweden)

    Sadosky Alesia B

    2009-06-01

    Full Text Available Abstract Background Sleep disturbances are a common and bothersome symptom of fibromyalgia (FM. This study reports psychometric properties of a single-item scale to assess sleep quality among individuals with FM. Methods Analyses were based on data from two randomized, double-blind, placebo-controlled trials of pregabalin (studies 1056 and 1077. In a daily diary, patients reported the quality of their sleep on a numeric rating scale ranging from 0 ("best possible sleep" to 10 ("worst possible sleep". Test re-test reliability of the Sleep Quality Scale was evaluated by computing intraclass correlation coefficients. Pearson correlation coefficients were computed between baseline Sleep Quality scores and baseline pain diary and Medical Outcomes Study (MOS Sleep scores. Responsiveness to treatment was evaluated by standardized effect sizes computed as the difference between least squares mean changes in Sleep Quality scores in the pregabalin and placebo groups divided by the standard deviation of Sleep Quality scores across all patients at baseline. Results Studies 1056 and 1077 included 748 and 745 patients, respectively. Most patients were female (study 1056: 94.4%; study 1077: 94.5% and white (study 1056: 90.2%; study 1077: 91.0%. Mean ages were 48.8 years (study 1056 and 50.1 years (study 1077. Test re-test reliability coefficients of the Sleep Quality Scale were 0.91 and 0.90 in the 1056 and 1077 studies, respectively. Pearson correlation coefficients between baseline Sleep Quality scores and baseline pain diary scores were 0.64 (p Conclusion These results provide evidence of the reproducibility, convergent validity, and responsiveness to treatment of the Sleep Quality Scale and provide a foundation for its further use and evaluation in FM patients.

  8. Effects of Reducing the Cognitive Load of Mathematics Test Items on Student Performance

    Directory of Open Access Journals (Sweden)

    Susan C. Gillmor

    2015-01-01

    Full Text Available This study explores a new item-writing framework for improving the validity of math assessment items. The authors transfer insights from Cognitive Load Theory (CLT, traditionally used in instructional design, to educational measurement. Fifteen, multiple-choice math assessment items were modified using research-based strategies for reducing extraneous cognitive load. An experimental design with 222 middle-school students tested the effects of the reduced cognitive load items on student performance and anxiety. Significant findings confirm the main research hypothesis that reducing the cognitive load of math assessment items improves student performance. Three load-reducing item modifications are identified as particularly effective for reducing item difficulty: signalling important information, aesthetic item organization, and removing extraneous content. Load reduction was not shown to impact student anxiety. Implications for classroom assessment and future research are discussed.

  9. An Explanatory Item Response Theory Approach for a Computer-Based Case Simulation Test

    Science.gov (United States)

    Kahraman, Nilüfer

    2014-01-01

    Problem: Practitioners working with multiple-choice tests have long utilized Item Response Theory (IRT) models to evaluate the performance of test items for quality assurance. The use of similar applications for performance tests, however, is often encumbered due to the challenges encountered in working with complicated data sets in which local…

  10. Work-related stress assessed by a text message single-item stress question.

    Science.gov (United States)

    Arapovic-Johansson, B; Wåhlin, C; Kwak, L; Björklund, C; Jensen, I

    2017-12-02

    Given the prevalence of work stress-related ill-health in the Western world, it is important to find cost-effective, easy-to-use and valid measures which can be used both in research and in practice. To examine the validity and reliability of the single-item stress question (SISQ), distributed weekly by short message service (SMS) and used for measurement of work-related stress. The convergent validity was assessed through associations between the SISQ and subscales of the Job Demand-Control-Support model, the Effort-Reward Imbalance model and scales measuring depression, exhaustion and sleep. The predictive validity was assessed using SISQ data collected through SMS. The reliability was analysed by the test-retest procedure. Correlations between the SISQ and all the subscales except for job strain and esteem reward were significant, ranging from -0.186 to 0.627. The SISQ could also predict sick leave, depression and exhaustion at 12-month follow-up. The analysis on reliability revealed a satisfactory stability with a weighted kappa between 0.804 and 0.868. The SISQ, administered through SMS, can be used for the screening of stress levels in a working population. © The Author 2017. Published by Oxford University Press on behalf of the Society of Occupational Medicine. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  11. Application of Item Response Theory to Tests of Substance-related Associative Memory

    Science.gov (United States)

    Shono, Yusuke; Grenard, Jerry L.; Ames, Susan L.; Stacy, Alan W.

    2015-01-01

    A substance-related word association test (WAT) is one of the commonly used indirect tests of substance-related implicit associative memory and has been shown to predict substance use. This study applied an item response theory (IRT) modeling approach to evaluate psychometric properties of the alcohol- and marijuana-related WATs and their items among 775 ethnically diverse at-risk adolescents. After examining the IRT assumptions, item fit, and differential item functioning (DIF) across gender and age groups, the original 18 WAT items were reduced to 14- and 15-items in the alcohol- and marijuana-related WAT, respectively. Thereafter, unidimensional one- and two-parameter logistic models (1PL and 2PL models) were fitted to the revised WAT items. The results demonstrated that both alcohol- and marijuana-related WATs have good psychometric properties. These results were discussed in light of the framework of a unified concept of construct validity (Messick, 1975, 1989, 1995). PMID:25134051

  12. Above-Level Test Item Functioning across Examinee Age Groups

    Science.gov (United States)

    Warne, Russell T.; Doty, Kristine J.; Malbica, Anne Marie; Angeles, Victor R.; Innes, Scott; Hall, Jared; Masterson-Nixon, Kelli

    2016-01-01

    "Above-level testing" (also called "above-grade testing," "out-of-level testing," and "off-level testing") is the practice of administering to a child a test that is designed for an examinee population that is older or in a more advanced grade. Above-level testing is frequently used to help educators design…

  13. Secondary Psychometric Examination of the Dimensional Obsessive-Compulsive Scale: Classical Testing, Item Response Theory, and Differential Item Functioning.

    Science.gov (United States)

    Thibodeau, Michel A; Leonard, Rachel C; Abramowitz, Jonathan S; Riemann, Bradley C

    2015-12-01

    The Dimensional Obsessive-Compulsive Scale (DOCS) is a promising measure of obsessive-compulsive disorder (OCD) symptoms but has received minimal psychometric attention. We evaluated the utility and reliability of DOCS scores. The study included 832 students and 300 patients with OCD. Confirmatory factor analysis supported the originally proposed four-factor structure. DOCS total and subscale scores exhibited good to excellent internal consistency in both samples (α = .82 to α = .96). Patient DOCS total scores reduced substantially during treatment (t = 16.01, d = 1.02). DOCS total scores discriminated between students and patients (sensitivity = 0.76, 1 - specificity = 0.23). The measure did not exhibit gender-based differential item functioning as tested by Mantel-Haenszel chi-square tests. Expected response options for each item were plotted as a function of item response theory and demonstrated that DOCS scores incrementally discriminate OCD symptoms ranging from low to extremely high severity. Incremental differences in DOCS scores appear to represent unbiased and reliable differences in true OCD symptom severity. © The Author(s) 2014.

  14. A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs

    Directory of Open Access Journals (Sweden)

    Eren Halil ÖZBERK

    2017-03-01

    Full Text Available In contrast with the previous studies, this study employed various test designs (simple and complex which allow the evaluation of the overall ability score estimations across multiple real test conditions. In this study, four factors were manipulated, namely the test design, number of items per dimension, correlation between dimensions and item selection methods. Using the generated item and ability parameters, dichotomous item responses were generated in by using M3PL compensatory multidimensional IRT model with specified correlations. MCAT composite ability score accuracy was evaluated using absolute bias (ABSBIAS, correlation and the root mean square error (RMSE between true and estimated ability scores. The results suggest that the multidimensional test structure, number of item per dimension and correlation between dimensions had significant effect on item selection methods for the overall score estimations. For simple structure test design it was found that V1 item selection has the lowest absolute bias estimations for both long and short tests while estimating overall scores. As the model gets complex KL item selection method performed better than other two item selection method.

  15. Optimizing the Use of Response Times for Item Selection in Computerized Adaptive Testing

    Science.gov (United States)

    Choe, Edison M.; Kern, Justin L.; Chang, Hua-Hua

    2018-01-01

    Despite common operationalization, measurement efficiency of computerized adaptive testing should not only be assessed in terms of the number of items administered but also the time it takes to complete the test. To this end, a recent study introduced a novel item selection criterion that maximizes Fisher information per unit of expected response…

  16. Applications of NLP Techniques to Computer-Assisted Authoring of Test Items for Elementary Chinese

    Science.gov (United States)

    Liu, Chao-Lin; Lin, Jen-Hsiang; Wang, Yu-Chun

    2010-01-01

    The authors report an implemented environment for computer-assisted authoring of test items and provide a brief discussion about the applications of NLP techniques for computer assisted language learning. Test items can serve as a tool for language learners to examine their competence in the target language. The authors apply techniques for…

  17. A Method for Generating Educational Test Items That Are Aligned to the Common Core State Standards

    Science.gov (United States)

    Gierl, Mark J.; Lai, Hollis; Hogan, James B.; Matovinovic, Donna

    2015-01-01

    The demand for test items far outstrips the current supply. This increased demand can be attributed, in part, to the transition to computerized testing, but, it is also linked to dramatic changes in how 21st century educational assessments are designed and administered. One way to address this growing demand is with automatic item generation.…

  18. Relationships among Classical Test Theory and Item Response Theory Frameworks via Factor Analytic Models

    Science.gov (United States)

    Kohli, Nidhi; Koran, Jennifer; Henn, Lisa

    2015-01-01

    There are well-defined theoretical differences between the classical test theory (CTT) and item response theory (IRT) frameworks. It is understood that in the CTT framework, person and item statistics are test- and sample-dependent. This is not the perception with IRT. For this reason, the IRT framework is considered to be theoretically superior…

  19. Strategies for Controlling Item Exposure in Computerized Adaptive Testing with the Generalized Partial Credit Model

    Science.gov (United States)

    Davis, Laurie Laughlin

    2004-01-01

    Choosing a strategy for controlling item exposure has become an integral part of test development for computerized adaptive testing (CAT). This study investigated the performance of six procedures for controlling item exposure in a series of simulated CATs under the generalized partial credit model. In addition to a no-exposure control baseline…

  20. Effects of Using Modified Items to Test Students with Persistent Academic Difficulties

    Science.gov (United States)

    Elliott, Stephen N.; Kettler, Ryan J.; Beddow, Peter A.; Kurz, Alexander; Compton, Elizabeth; McGrath, Dawn; Bruen, Charles; Hinton, Kent; Palmer, Porter; Rodriguez, Michael C.; Bolt, Daniel; Roach, Andrew T.

    2010-01-01

    This study investigated the effects of using modified items in achievement tests to enhance accessibility. An experiment determined whether tests composed of modified items would reduce the performance gap between students eligible for an alternate assessment based on modified achievement standards (AA-MAS) and students not eligible, and the…

  1. Latent Trait Theory Applications to Test Item Bias Methodology. Research Memorandum No. 1.

    Science.gov (United States)

    Osterlind, Steven J.; Martois, John S.

    This study discusses latent trait theory applications to test item bias methodology. A real data set is used in describing the rationale and application of the Rasch probabilistic model item calibrations across various ethnic group populations. A high school graduation proficiency test covering reading comprehension, writing mechanics, and…

  2. Test Score Equating Using Discrete Anchor Items versus Passage-Based Anchor Items: A Case Study Using "SAT"® Data. Research Report. ETS RR-14-14

    Science.gov (United States)

    Liu, Jinghua; Zu, Jiyun; Curley, Edward; Carey, Jill

    2014-01-01

    The purpose of this study is to investigate the impact of discrete anchor items versus passage-based anchor items on observed score equating using empirical data.This study compares an "SAT"® critical reading anchor that contains more discrete items proportionally, compared to the total tests to be equated, to another anchor that…

  3. Piecewise Polynomial Fitting with Trend Item Removal and Its Application in a Cab Vibration Test

    Directory of Open Access Journals (Sweden)

    Wu Ren

    2018-01-01

    Full Text Available The trend item of a long-term vibration signal is difficult to remove. This paper proposes a piecewise integration method to remove trend items. Examples of direct integration without trend item removal, global integration after piecewise polynomial fitting with trend item removal, and direct integration after piecewise polynomial fitting with trend item removal were simulated. The results showed that direct integration of the fitted piecewise polynomial provided greater acceleration and displacement precision than the other two integration methods. A vibration test was then performed on a special equipment cab. The results indicated that direct integration by piecewise polynomial fitting with trend item removal was highly consistent with the measured signal data. However, the direct integration method without trend item removal resulted in signal distortion. The proposed method can help with frequency domain analysis of vibration signals and modal parameter identification for such equipment.

  4. Science Library of Test Items. Volume Eight. Mastery Testing Program. Series 3 & 4 Supplements to Introduction and Manual.

    Science.gov (United States)

    New South Wales Dept. of Education, Sydney (Australia).

    Continuing a series of short tests aimed at measuring student mastery of specific skills in the natural sciences, this supplementary volume includes teachers' notes, a users' guide and inspection copies of test items 27 to 50. Answer keys and test scoring statistics are provided. The items are designed for grades 7 through 10, and a list of the…

  5. Using personality item characteristics to predict single-item reliability, retest reliability, and self-other agreement

    NARCIS (Netherlands)

    de Vries, Reinout Everhard; Realo, Anu; Allik, Jüri

    2016-01-01

    The use of reliability estimates is increasingly scrutinized as scholars become more aware that test–retest stability and self–other agreement provide a better approximation of the theoretical and practical usefulness of an instrument than its internal reliability. In this study, we investigate item

  6. Development and validation of the Single Item Trait Empathy Scale (SITES).

    Science.gov (United States)

    Konrath, Sara; Meier, Brian P; Bushman, Brad J

    2018-04-01

    Empathy involves feeling compassion for others and imagining how they feel. In this article, we develop and validate the Single Item Trait Empathy Scale (SITES), which contains only one item that takes seconds to complete. In seven studies (N=5,724), the SITES was found to be both reliable and valid. It correlated in expected ways with a wide variety of intrapersonal outcomes. For example, it is negatively correlated with narcissism, depression, anxiety, and alexithymia. In contrast, it is positively correlated with other measures of empathy, self-esteem, subjective well-being, and agreeableness. The SITES also correlates with a wide variety of interpersonal outcomes, especially compassion for others and helping others. The SITES is recommended in situations when time or question quantity is constrained.

  7. Hippocampal damage equally impairs memory for single items and memory for conjunctions.

    Science.gov (United States)

    Stark, Craig E L; Squire, Larry R

    2003-01-01

    single-item and associative memory.

  8. A comparison of item response models for accuracy and speed of item responses with applications to adaptive testing.

    Science.gov (United States)

    van Rijn, Peter W; Ali, Usama S

    2017-05-01

    We compare three modelling frameworks for accuracy and speed of item responses in the context of adaptive testing. The first framework is based on modelling scores that result from a scoring rule that incorporates both accuracy and speed. The second framework is the hierarchical modelling approach developed by van der Linden (2007, Psychometrika, 72, 287) in which a regular item response model is specified for accuracy and a log-normal model for speed. The third framework is the diffusion framework in which the response is assumed to be the result of a Wiener process. Although the three frameworks differ in the relation between accuracy and speed, one commonality is that the marginal model for accuracy can be simplified to the two-parameter logistic model. We discuss both conditional and marginal estimation of model parameters. Models from all three frameworks were fitted to data from a mathematics and spelling test. Furthermore, we applied a linear and adaptive testing mode to the data off-line in order to determine differences between modelling frameworks. It was found that a model from the scoring rule framework outperformed a hierarchical model in terms of model-based reliability, but the results were mixed with respect to correlations with external measures. © 2017 The British Psychological Society.

  9. Differential Item Functioning (DIF) among Spanish-Speaking English Language Learners (ELLs) in State Science Tests

    Science.gov (United States)

    Ilich, Maria O.

    Psychometricians and test developers evaluate standardized tests for potential bias against groups of test-takers by using differential item functioning (DIF). English language learners (ELLs) are a diverse group of students whose native language is not English. While they are still learning the English language, they must take their standardized tests for their school subjects, including science, in English. In this study, linguistic complexity was examined as a possible source of DIF that may result in test scores that confound science knowledge with a lack of English proficiency among ELLs. Two years of fifth-grade state science tests were analyzed for evidence of DIF using two DIF methods, Simultaneous Item Bias Test (SIBTest) and logistic regression. The tests presented a unique challenge in that the test items were grouped together into testlets---groups of items referring to a scientific scenario to measure knowledge of different science content or skills. Very large samples of 10, 256 students in 2006 and 13,571 students in 2007 were examined. Half of each sample was composed of Spanish-speaking ELLs; the balance was comprised of native English speakers. The two DIF methods were in agreement about the items that favored non-ELLs and the items that favored ELLs. Logistic regression effect sizes were all negligible, while SIBTest flagged items with low to high DIF. A decrease in socioeconomic status and Spanish-speaking ELL diversity may have led to inconsistent SIBTest effect sizes for items used in both testing years. The DIF results for the testlets suggested that ELLs lacked sufficient opportunity to learn science content. The DIF results further suggest that those constructed response test items requiring the student to draw a conclusion about a scientific investigation or to plan a new investigation tended to favor ELLs.

  10. Quantitative penetration testing with item response theory (extended version)

    NARCIS (Netherlands)

    Arnold, Florian; Pieters, Wolter; Stoelinga, Mariëlle Ida Antoinette

    2013-01-01

    Existing penetration testing approaches assess the vulnerability of a system by determining whether certain attack paths are possible in practice. Therefore, penetration testing has thus far been used as a qualitative research method. To enable quantitative approaches to security risk management,

  11. The effects of linguistic modification on ESL students' comprehension of nursing course test items.

    Science.gov (United States)

    Bosher, Susan; Bowles, Melissa

    2008-01-01

    Recent research has indicated that language may be a source of construct-irrelevant variance for non-native speakers of English, or English as a second language (ESL) students, when they take exams. As a result, exams may not accurately measure knowledge of nursing content. One accommodation often used to level the playing field for ESL students is linguistic modification, a process by which the reading load of test items is reduced while the content and integrity of the item are maintained. Research on the effects of linguistic modification has been conducted on examinees in the K-12 population, but is just beginning in other areas. This study describes the collaborative process by which items from a pathophysiology exam were linguistically modified and subsequently evaluated for comprehensibility by ESL students. Findings indicate that in a majority of cases, modification improved examinees' comprehension of test items. Implications for test item writing and future research are discussed.

  12. Factor Structure and Reliability of Test Items for Saudi Teacher Licence Assessment

    Science.gov (United States)

    Alsadaawi, Abdullah Saleh

    2017-01-01

    The Saudi National Assessment Centre administers the Computer Science Teacher Test for teacher certification. The aim of this study is to explore gender differences in candidates' scores, and investigate dimensionality, reliability, and differential item functioning using confirmatory factor analysis and item response theory. The confirmatory…

  13. Testing for Nonuniform Differential Item Functioning with Multiple Indicator Multiple Cause Models

    Science.gov (United States)

    Woods, Carol M.; Grimm, Kevin J.

    2011-01-01

    In extant literature, multiple indicator multiple cause (MIMIC) models have been presented for identifying items that display uniform differential item functioning (DIF) only, not nonuniform DIF. This article addresses, for apparently the first time, the use of MIMIC models for testing both uniform and nonuniform DIF with categorical indicators. A…

  14. A Feedback Control Strategy for Enhancing Item Selection Efficiency in Computerized Adaptive Testing

    Science.gov (United States)

    Weissman, Alexander

    2006-01-01

    A computerized adaptive test (CAT) may be modeled as a closed-loop system, where item selection is influenced by trait level ([theta]) estimation and vice versa. When discrepancies exist between an examinee's estimated and true [theta] levels, nonoptimal item selection is a likely result. Nevertheless, examinee response behavior consistent with…

  15. Australian Biology Test Item Bank, Years 11 and 12. Volume II: Year 12.

    Science.gov (United States)

    Brown, David W., Ed.; Sewell, Jeffrey J., Ed.

    This document consists of test items which are applicable to biology courses throughout Australia (irrespective of course materials used); assess key concepts within course statement (for both core and optional studies); assess a wide range of cognitive processes; and are relevant to current biological concepts. These items are arranged under…

  16. Australian Biology Test Item Bank, Years 11 and 12. Volume I: Year 11.

    Science.gov (United States)

    Brown, David W., Ed.; Sewell, Jeffrey J., Ed.

    This document consists of test items which are applicable to biology courses throughout Australia (irrespective of course materials used); assess key concepts within course statement (for both core and optional studies); assess a wide range of cognitive processes; and are relevant to current biological concepts. These items are arranged under…

  17. What Does a Verbal Test Measure? A New Approach to Understanding Sources of Item Difficulty.

    Science.gov (United States)

    Berk, Eric J. Vanden; Lohman, David F.; Cassata, Jennifer Coyne

    Assessing the construct relevance of mental test results continues to present many challenges, and it has proven to be particularly difficult to assess the construct relevance of verbal items. This study was conducted to gain a better understanding of the conceptual sources of verbal item difficulty using a unique approach that integrates…

  18. The Prediction of Item Parameters Based on Classical Test Theory and Latent Trait Theory

    Science.gov (United States)

    Anil, Duygu

    2008-01-01

    In this study, the prediction power of the item characteristics based on the experts' predictions on conditions try-out practices cannot be applied was examined for item characteristics computed depending on classical test theory and two-parameters logistic model of latent trait theory. The study was carried out on 9914 randomly selected students…

  19. Development of an item bank for computerized adaptive test (CAT) measurement of pain

    DEFF Research Database (Denmark)

    Petersen, Morten Aa.; Aaronson, Neil K; Chie, Wei-Chu

    2016-01-01

    PURPOSE: Patient-reported outcomes should ideally be adapted to the individual patient while maintaining comparability of scores across patients. This is achievable using computerized adaptive testing (CAT). The aim here was to develop an item bank for CAT measurement of the pain domain as measured...... were obtained from 1103 cancer patients from five countries. Psychometric evaluations showed that 16 items could be retained in a unidimensional item bank. Evaluations indicated that use of the CAT measure may reduce sample size requirements with 15-25 % compared to using the QLQ-C30 pain scale....... CONCLUSIONS: We have established an item bank of 16 items suitable for CAT measurement of pain. While being backward compatible with the QLQ-C30, the new item bank will significantly improve measurement precision of pain. We recommend initiating CAT measurement by screening for pain using the two original QLQ...

  20. Explanatory item response modelling of an abstract reasoning assessment: A case for modern test design

    OpenAIRE

    Helland, Fredrik

    2016-01-01

    Assessment is an integral part of society and education, and for this reason it is important to know what you measure. This thesis is about explanatory item response modelling of an abstract reasoning assessment, with the objective to create a modern test design framework for automatic generation of valid and precalibrated items of abstract reasoning. Modern test design aims to strengthen the connections between the different components of a test, with a stress on strong theory, systematic it...

  1. A comparison of discriminant logistic regression and Item Response Theory Likelihood-Ratio Tests for Differential Item Functioning (IRTLRDIF) in polytomous short tests.

    Science.gov (United States)

    Hidalgo, María D; López-Martínez, María D; Gómez-Benito, Juana; Guilera, Georgina

    2016-01-01

    Short scales are typically used in the social, behavioural and health sciences. This is relevant since test length can influence whether items showing DIF are correctly flagged. This paper compares the relative effectiveness of discriminant logistic regression (DLR) and IRTLRDIF for detecting DIF in polytomous short tests. A simulation study was designed. Test length, sample size, DIF amount and item response categories number were manipulated. Type I error and power were evaluated. IRTLRDIF and DLR yielded Type I error rates close to nominal level in no-DIF conditions. Under DIF conditions, Type I error rates were affected by test length DIF amount, degree of test contamination, sample size and number of item response categories. DLR showed a higher Type I error rate than did IRTLRDIF. Power rates were affected by DIF amount and sample size, but not by test length. DLR achieved higher power rates than did IRTLRDIF in very short tests, although the high Type I error rate involved means that this result cannot be taken into account. Test length had an important impact on the Type I error rate. IRTLRDIF and DLR showed a low power rate in short tests and with small sample sizes.

  2. Item response theory, computerized adaptive testing, and PROMIS: assessment of physical function.

    Science.gov (United States)

    Fries, James F; Witter, James; Rose, Matthias; Cella, David; Khanna, Dinesh; Morgan-DeWitt, Esi

    2014-01-01

    Patient-reported outcome (PRO) questionnaires record health information directly from research participants because observers may not accurately represent the patient perspective. Patient-reported Outcomes Measurement Information System (PROMIS) is a US National Institutes of Health cooperative group charged with bringing PRO to a new level of precision and standardization across diseases by item development and use of item response theory (IRT). With IRT methods, improved items are calibrated on an underlying concept to form an item bank for a "domain" such as physical function (PF). The most informative items can be combined to construct efficient "instruments" such as 10-item or 20-item PF static forms. Each item is calibrated on the basis of the probability that a given person will respond at a given level, and the ability of the item to discriminate people from one another. Tailored forms may cover any desired level of the domain being measured. Computerized adaptive testing (CAT) selects the best items to sharpen the estimate of a person's functional ability, based on prior responses to earlier questions. PROMIS item banks have been improved with experience from several thousand items, and are calibrated on over 21,000 respondents. In areas tested to date, PROMIS PF instruments are superior or equal to Health Assessment Questionnaire and Medical Outcome Study Short Form-36 Survey legacy instruments in clarity, translatability, patient importance, reliability, and sensitivity to change. Precise measures, such as PROMIS, efficiently incorporate patient self-report of health into research, potentially reducing research cost by lowering sample size requirements. The advent of routine IRT applications has the potential to transform PRO measurement.

  3. The quadratic relationship between difficulty of intelligence test items and their correlations with working memory

    Directory of Open Access Journals (Sweden)

    Tomasz eSmoleń

    2015-08-01

    Full Text Available Fluid intelligence (Gf is a crucial cognitive ability that involves abstract reasoning in order to solve novel problems. Recent research demonstrated that Gf strongly depends on the individual effectiveness of working memory (WM. We investigated a popular claim that if the storage capacity underlay the WM-Gf correlation, then such a correlation should increase with an increasing number of items or rules (load in a Gf test. As often no such link is observed, on that basis the storage-capacity account is rejected, and alternative accounts of Gf (e.g., related to executive control or processing speed are proposed. Using both analytical inference and numerical simulations, we demonstrated that the load-dependent change in correlation is primarily a function of the amount of floor/ceiling effect for particular items. Thus, the item-wise WM correlation of a Gf test depends on its overall difficulty, and the difficulty distribution across its items. When the early test items yield huge ceiling, but the late items do not approach floor, that correlation will increase throughout the test. If the early items locate themselves between ceiling and floor, but the late items approach floor, the respective correlation will decrease. For a hallmark Gf test, the Raven test, whose items span from ceiling to floor, the quadratic relationship is expected, and it was shown empirically using a large sample and two types of WMC tasks. In consequence, no changes in correlation due to varying WM/Gf load, or lack of them, can yield an argument for or against any theory of WM/Gf. Moreover, as the mathematical properties of the correlation formula make it relatively immune to ceiling/floor effects for overall moderate correlations, only minor changes (if any in the WM-Gf correlation should be expected for many psychological tests.

  4. The quadratic relationship between difficulty of intelligence test items and their correlations with working memory.

    Science.gov (United States)

    Smolen, Tomasz; Chuderski, Adam

    2015-01-01

    Fluid intelligence (Gf) is a crucial cognitive ability that involves abstract reasoning in order to solve novel problems. Recent research demonstrated that Gf strongly depends on the individual effectiveness of working memory (WM). We investigated a popular claim that if the storage capacity underlay the WM-Gf correlation, then such a correlation should increase with an increasing number of items or rules (load) in a Gf-test. As often no such link is observed, on that basis the storage-capacity account is rejected, and alternative accounts of Gf (e.g., related to executive control or processing speed) are proposed. Using both analytical inference and numerical simulations, we demonstrated that the load-dependent change in correlation is primarily a function of the amount of floor/ceiling effect for particular items. Thus, the item-wise WM correlation of a Gf-test depends on its overall difficulty, and the difficulty distribution across its items. When the early test items yield huge ceiling, but the late items do not approach floor, that correlation will increase throughout the test. If the early items locate themselves between ceiling and floor, but the late items approach floor, the respective correlation will decrease. For a hallmark Gf-test, the Raven-test, whose items span from ceiling to floor, the quadratic relationship is expected, and it was shown empirically using a large sample and two types of WMC tasks. In consequence, no changes in correlation due to varying WM/Gf load, or lack of them, can yield an argument for or against any theory of WM/Gf. Moreover, as the mathematical properties of the correlation formula make it relatively immune to ceiling/floor effects for overall moderate correlations, only minor changes (if any) in the WM-Gf correlation should be expected for many psychological tests.

  5. Face Validity of the Single Work Ability Item: Comparison with Objectively Measured Heart Rate Reserve over Several Days

    Science.gov (United States)

    Gupta, Nidhi; Jensen, Bjørn Søvsø; Søgaard, Karen; Carneiro, Isabella Gomes; Christiansen, Caroline Stordal; Hanisch, Christiana; Holtermann, Andreas

    2014-01-01

    Purpose: The purpose of this study was to investigate the face validity of the self-reported single item work ability with objectively measured heart rate reserve (%HRR) among blue-collar workers. Methods: We utilized data from 127 blue-collar workers (Female = 53; Male = 74) aged 18–65 years from the cross-sectional “New method for Objective Measurements of physical Activity in Daily living (NOMAD)” study. The workers reported their single item work ability and completed an aerobic capacity cycling test and objective measurements of heart rate reserve monitored with Actiheart for 3–4 days with a total of 5,810 h, including 2,640 working hours. Results: A significant moderate correlation between work ability and %HRR was observed among males (R = −0.33, P = 0.005), but not among females (R = 0.11, P = 0.431). In a gender-stratified multi-adjusted logistic regression analysis, males with high %HRR were more likely to report a reduced work ability compared to males with low %HRR [OR = 4.75, 95% confidence interval (95% CI) = 1.31 to 17.25]. However, this association was not found among females (OR = 0.26, 95% CI 0.03 to 2.16), and a significant interaction between work ability, %HRR and gender was observed (P = 0.03). Conclusions: The observed association between work ability and objectively measured %HRR over several days among male blue-collar workers supports the face validity of the single work ability item. It is a useful and valid measure of the relation between physical work demands and resources among male blue-collar workers. The contrasting association among females needs to be further investigated. PMID:24840350

  6. Face Validity of the Single Work Ability Item: Comparison with Objectively Measured Heart Rate Reserve over Several Days

    Directory of Open Access Journals (Sweden)

    Nidhi Gupta

    2014-05-01

    Full Text Available Purpose: The purpose of this study was to investigate the face validity of the self-reported single item work ability with objectively measured heart rate reserve (%HRR among blue-collar workers. Methods: We utilized data from 127 blue-collar workers (Female = 53; Male = 74 aged 18–65 years from the cross-sectional “New method for Objective Measurements of physical Activity in Daily living (NOMAD” study. The workers reported their single item work ability and completed an aerobic capacity cycling test and objective measurements of heart rate reserve monitored with Actiheart for 3–4 days with a total of 5,810 h, including 2,640 working hours. Results: A significant moderate correlation between work ability and %HRR was observed among males (R = −0.33, P = 0.005, but not among females (R = 0.11, P = 0.431. In a gender-stratified multi-adjusted logistic regression analysis, males with high %HRR were more likely to report a reduced work ability compared to males with low %HRR [OR = 4.75, 95% confidence interval (95% CI = 1.31 to 17.25]. However, this association was not found among females (OR = 0.26, 95% CI 0.03 to 2.16, and a significant interaction between work ability, %HRR and gender was observed (P = 0.03. Conclusions: The observed association between work ability and objectively measured %HRR over several days among male blue-collar workers supports the face validity of the single work ability item. It is a useful and valid measure of the relation between physical work demands and resources among male blue-collar workers. The contrasting association among females needs to be further investigated.

  7. Comparison of Classical Test Theory and Item Response Theory in Individual Change Assessment

    NARCIS (Netherlands)

    Jabrayilov, Ruslan; Emons, Wilco H. M.; Sijtsma, Klaas

    2016-01-01

    Clinical psychologists are advised to assess clinical and statistical significance when assessing change in individual patients. Individual change assessment can be conducted using either the methodologies of classical test theory (CTT) or item response theory (IRT). Researchers have been optimistic

  8. The Stanford Leisure-Time Activity Categorical Item (L-Cat): a single categorical item sensitive to physical activity changes in overweight/obese women.

    Science.gov (United States)

    Kiernan, M; Schoffman, D E; Lee, K; Brown, S D; Fair, J M; Perri, M G; Haskell, W L

    2013-12-01

    Physical activity is essential for chronic disease prevention, yet Cat) is a single item comprising six descriptive categories ranging from inactive to very active. This novel methodological approach assesses national activity recommendations as well as multiple clinically relevant categories below and above the recommendations, and incorporates critical methodological principles that enhance psychometrics (reliability, validity and sensitivity to change). We evaluated the L-Cat's psychometrics among 267 overweight/obese women who were asked to meet the national activity recommendations in a randomized behavioral weight-loss trial. The L-Cat had excellent test-retest reliability (κ=0.64, PCat category at 6 months was associated with 1059 more daily pedometer steps (95% CI 712-1407, β=0.38, PCat categories differentiated from each other in a dose-response gradient for steps and weight loss (PsCat was sensitive to change in response to the trial's activity component. Women increased one L-Cat category at 6 months (M=1.0±1.4, PCat categories at 6 months lost more weight than those who did not (M=-4.6%, 95% CI -6.7 to -2.5, PCat has timely potential for clinical use such as tracking activity changes via electronic medical records, especially among overweight/obese populations who are unable or unlikely to reach national recommendations.

  9. Item Response Theory analysis of Fagerström Test for Cigarette Dependence.

    Science.gov (United States)

    Svicher, Andrea; Cosci, Fiammetta; Giannini, Marco; Pistelli, Francesco; Fagerström, Karl

    2018-02-01

    The Fagerström Test for Cigarette Dependence (FTCD) and the Heaviness of Smoking Index (HSI) are the gold standard measures to assess cigarette dependence. However, FTCD reliability and factor structure have been questioned and HSI psychometric properties are in need of further investigations. The present study examined the psychometrics properties of the FTCD and the HSI via the Item Response Theory. The study was a secondary analysis of data collected in 862 Italian daily smokers. Confirmatory factor analysis was run to evaluate the dimensionality of FTCD. A Grade Response Model was applied to FTCD and HSI to verify the fit to the data. Both item and test functioning were analyzed and item statistics, Test Information Function, and scale reliabilities were calculated. Mokken Scale Analysis was applied to estimate homogeneity and Loevinger's coefficients were calculated. The FTCD showed unidimensionality and homogeneity for most of the items and for the total score. It also showed high sensitivity and good reliability from medium to high levels of cigarette dependence, although problems related to some items (i.e., items 3 and 5) were evident. HSI had good homogeneity, adequate item functioning, and high reliability from medium to high levels of cigarette dependence. Significant Differential Item Functioning was found for items 1, 4, 5 of the FTCD and for both items of HSI. HSI seems highly recommended in clinical settings addressed to heavy smokers while FTCD would be better used in smokers with a level of cigarette dependence ranging between low and high. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. Overcoming the effects of differential skewness of test items in scale construction

    Directory of Open Access Journals (Sweden)

    Johann M. Schepers

    2004-10-01

    Full Text Available The principal objective of the study was to develop a procedure for overcoming the effects of differential skewness of test items in scale construction. It was shown that the degree of skewness of test items places an upper limit on the correlations between the items, regardless of the contents of the items. If the items are ordered in terms of skewness the resulting inter correlation matrix forms a simplex or a pseudo simplex. Factoring such a matrix results in a multiplicity of factors, most of which are artifacts. A procedure for overcoming this problem was demonstrated with items from the Locus of Control Inventory (Schepers, 1995. The analysis was based on a sample of 1662 first year university students. Opsomming Die hoofdoel van die studie was om ’n prosedure te ontwikkel om die gevolge van differensiële skeefheid van toetsitems, in skaalkonstruksie, teen te werk. Daar is getoon dat die graad van skeefheid van toetsitems ’n boonste grens plaas op die korrelasies tussen die items ongeag die inhoud daarvan. Indien die items gerangskik word volgens graad van skeefheid, sal die interkorrelasiematriks van die items ’n simpleks of pseudosimpleks vorm. Indien so ’n matriks aan faktorontleding onderwerp word, lei dit tot ’n veelheid van faktore waarvan die meerderheid artefakte is. ’n Prosedure om hierdie probleem te bowe te kom, is gedemonstreer met behulp van die items van die Lokus van Beheer-vraelys (Schepers, 1995. Die ontledings is op ’n steekproef van 1662 eerstejaaruniversiteitstudente gebaseer.

  11. Measuring single constructs by single items: Constructing an even shorter version of the "Short Five" personality inventory.

    Directory of Open Access Journals (Sweden)

    Kenn Konstabel

    Full Text Available The aim of this study was to construct a short, 30-item personality questionnaire that would be, in terms of content and meaning of the scores, as comparable as possible with longer, well-established inventories such as NEO PI-R and its clones. To do this, we shortened the formerly constructed 60-item "Short Five" (S5 by half so that each subscale would be represented by a single item. We compared all possibilities of selecting 30 items (preserving balanced keying within each domain of the five-factor model in terms of correlations with well-established scales, self-peer correlations, and clarity of meaning, and selected an optimal combination for each domain. The resulting shortened questionnaire, XS5, was compared to the original S5 using data from student samples in 6 different countries (Estonia, Finland, UK, Germany, Spain, and China, and a representative Finnish sample. The correlations between XS5 domain scales and their longer counterparts from well-established scales ranged from 0.74 to 0.84; the difference from the equivalent correlations for full version of S5 or from meta-analytic short-term dependability coefficients of NEO PI-R was not large. In terms of prediction of external criteria (emotional experience and self-reported behaviours, there were no important differences between XS5, S5, and the longer well-established scales. Controlling for acquiescence did not improve the prediction of criteria, self-peer correlations, or correlations with longer scales, but it did improve internal reliability and, in some analyses, comparability of the principal component structure. XS5 can be recommended as an economic measure of the five-factor model of personality at the level of domain scales; it has reasonable psychometric properties, fair correlations with longer well-established scales, and it can predict emotional experience and self-reported behaviours no worse than S5. When subscales are essential, we would still recommend using the

  12. Measuring single constructs by single items: Constructing an even shorter version of the “Short Five” personality inventory

    Science.gov (United States)

    Konstabel, Kenn; Lönnqvist, Jan-Erik; Leikas, Sointu; García Velázquez, Regina; Qin, Hiaying; Verkasalo, Markku; Walkowitz, Gari

    2017-01-01

    The aim of this study was to construct a short, 30-item personality questionnaire that would be, in terms of content and meaning of the scores, as comparable as possible with longer, well-established inventories such as NEO PI-R and its clones. To do this, we shortened the formerly constructed 60-item “Short Five” (S5) by half so that each subscale would be represented by a single item. We compared all possibilities of selecting 30 items (preserving balanced keying within each domain of the five-factor model) in terms of correlations with well-established scales, self-peer correlations, and clarity of meaning, and selected an optimal combination for each domain. The resulting shortened questionnaire, XS5, was compared to the original S5 using data from student samples in 6 different countries (Estonia, Finland, UK, Germany, Spain, and China), and a representative Finnish sample. The correlations between XS5 domain scales and their longer counterparts from well-established scales ranged from 0.74 to 0.84; the difference from the equivalent correlations for full version of S5 or from meta-analytic short-term dependability coefficients of NEO PI-R was not large. In terms of prediction of external criteria (emotional experience and self-reported behaviours), there were no important differences between XS5, S5, and the longer well-established scales. Controlling for acquiescence did not improve the prediction of criteria, self-peer correlations, or correlations with longer scales, but it did improve internal reliability and, in some analyses, comparability of the principal component structure. XS5 can be recommended as an economic measure of the five-factor model of personality at the level of domain scales; it has reasonable psychometric properties, fair correlations with longer well-established scales, and it can predict emotional experience and self-reported behaviours no worse than S5. When subscales are essential, we would still recommend using the full version

  13. A Comparison of Procedures for Content-Sensitive Item Selection in Computerized Adaptive Tests.

    Science.gov (United States)

    Kingsbury, G. Gage; Zara, Anthony R.

    1991-01-01

    This simulation investigated two procedures that reduce differences between paper-and-pencil testing and computerized adaptive testing (CAT) by making CAT content sensitive. Results indicate that the price in terms of additional test items of using constrained CAT for content balancing is much smaller than that of using testlets. (SLD)

  14. Using Set Covering with Item Sampling to Analyze the Infeasibility of Linear Programming Test Assembly Models

    Science.gov (United States)

    Huitzing, Hiddo A.

    2004-01-01

    This article shows how set covering with item sampling (SCIS) methods can be used in the analysis and preanalysis of linear programming models for test assembly (LPTA). LPTA models can construct tests, fulfilling a set of constraints set by the test assembler. Sometimes, no solution to the LPTA model exists. The model is then said to be…

  15. Test of Achievement in Quantitative Economics for Secondary Schools: Construction and Validation Using Item Response Theory

    Science.gov (United States)

    Eleje, Lydia I.; Esomonu, Nkechi P. M.

    2018-01-01

    A Test to measure achievement in quantitative economics among secondary school students was developed and validated in this study. The test is made up 20 multiple choice test items constructed based on quantitative economics sub-skills. Six research questions guided the study. Preliminary validation was done by two experienced teachers in…

  16. Analysis Test of Understanding of Vectors with the Three-Parameter Logistic Model of Item Response Theory and Item Response Curves Technique

    Science.gov (United States)

    Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

    2016-01-01

    This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming…

  17. Development of abbreviated eight-item form of the Penn Verbal Reasoning Test.

    Science.gov (United States)

    Bilker, Warren B; Wierzbicki, Michael R; Brensinger, Colleen M; Gur, Raquel E; Gur, Ruben C

    2014-12-01

    The ability to reason with language is a highly valued cognitive capacity that correlates with IQ measures and is sensitive to damage in language areas. The Penn Verbal Reasoning Test (PVRT) is a 29-item computerized test for measuring abstract analogical reasoning abilities using language. The full test can take over half an hour to administer, which limits its applicability in large-scale studies. We previously described a procedure for abbreviating a clinical rating scale and a modified procedure for reducing tests with a large number of items. Here we describe the application of the modified method to reducing the number of items in the PVRT to a parsimonious subset of items that accurately predicts the total score. As in our previous reduction studies, a split sample is used for model fitting and validation, with cross-validation to verify results. We find that an 8-item scale predicts the total 29-item score well, achieving a correlation of .9145 for the reduced form for the model fitting sample and .8952 for the validation sample. The results indicate that a drastically abbreviated version, which cuts administration time by more than 70%, can be safely administered as a predictor of PVRT performance. © The Author(s) 2014.

  18. Development of Abbreviated Eight-Item Form of the Penn Verbal Reasoning Test

    Science.gov (United States)

    Bilker, Warren B.; Wierzbicki, Michael R.; Brensinger, Colleen M.; Gur, Raquel E.; Gur, Ruben C.

    2014-01-01

    The ability to reason with language is a highly valued cognitive capacity that correlates with IQ measures and is sensitive to damage in language areas. The Penn Verbal Reasoning Test (PVRT) is a 29-item computerized test for measuring abstract analogical reasoning abilities using language. The full test can take over half an hour to administer, which limits its applicability in large-scale studies. We previously described a procedure for abbreviating a clinical rating scale and a modified procedure for reducing tests with a large number of items. Here we describe the application of the modified method to reducing the number of items in the PVRT to a parsimonious subset of items that accurately predicts the total score. As in our previous reduction studies, a split sample is used for model fitting and validation, with cross-validation to verify results. We find that an 8-item scale predicts the total 29-item score well, achieving a correlation of .9145 for the reduced form for the model fitting sample and .8952 for the validation sample. The results indicate that a drastically abbreviated version, which cuts administration time by more than 70%, can be safely administered as a predictor of PVRT performance. PMID:24577310

  19. Item Response Theory Analyses of the Cambridge Face Memory Test (CFMT)

    Science.gov (United States)

    Cho, Sun-Joo; Wilmer, Jeremy; Herzmann, Grit; McGugin, Rankin; Fiset, Daniel; Van Gulick, Ana E.; Ryan, Katie; Gauthier, Isabel

    2014-01-01

    We evaluated the psychometric properties of the Cambridge face memory test (CFMT; Duchaine & Nakayama, 2006). First, we assessed the dimensionality of the test with a bi-factor exploratory factor analysis (EFA). This EFA analysis revealed a general factor and three specific factors clustered by targets of CFMT. However, the three specific factors appeared to be minor factors that can be ignored. Second, we fit a unidimensional item response model. This item response model showed that the CFMT items could discriminate individuals at different ability levels and covered a wide range of the ability continuum. We found the CFMT to be particularly precise for a wide range of ability levels. Third, we implemented item response theory (IRT) differential item functioning (DIF) analyses for each gender group and two age groups (Age ≤ 20 versus Age > 21). This DIF analysis suggested little evidence of consequential differential functioning on the CFMT for these groups, supporting the use of the test to compare older to younger, or male to female, individuals. Fourth, we tested for a gender difference on the latent facial recognition ability with an explanatory item response model. We found a significant but small gender difference on the latent ability for face recognition, which was higher for women than men by 0.184, at age mean 23.2, controlling for linear and quadratic age effects. Finally, we discuss the practical considerations of the use of total scores versus IRT scale scores in applications of the CFMT. PMID:25642930

  20. Analysis test of understanding of vectors with the three-parameter logistic model of item response theory and item response curves technique

    Science.gov (United States)

    Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

    2016-12-01

    This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.

  1. Analysis test of understanding of vectors with the three-parameter logistic model of item response theory and item response curves technique

    Directory of Open Access Journals (Sweden)

    Suttida Rakkapao

    2016-10-01

    Full Text Available This study investigated the multiple-choice test of understanding of vectors (TUV, by applying item response theory (IRT. The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test’s distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.

  2. Test-retest reliability of Eurofit Physical Fitness items for children with visual impairments

    NARCIS (Netherlands)

    Houwen, Suzanne; Visscher, Chris; Hartman, Esther; Lemmink, Koen A. P. M.

    The purpose of this study was to examine the test-retest reliability of physical fitness items from the European Test of Physical Fitness (Eurofit) for children with visual impairments. A sample of 21 children, ages 6-12 years, that were recruited from a special school for children with visual

  3. The Relative Importance of Persons, Items, Subtests, and Languages to TOEFL Test Variance.

    Science.gov (United States)

    Brown, James Dean

    1999-01-01

    Explored the relative contributions to Test of English as a Foreign Language (TOEFL) score dependability of various numbers of persons, items, subtests, languages, and their various interactions. Sampled 15,000 test takers, 1000 each from 15 different language backgrounds. (Author/VWL)

  4. Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning

    Science.gov (United States)

    Li, Zhushan

    2014-01-01

    Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…

  5. Use of differential item functioning (DIF analysis for bias analysis in test construction

    Directory of Open Access Journals (Sweden)

    Marié De Beer

    2004-10-01

    Opsomming Waar differensiële itemfunksioneringsprosedures (DIF-prosedures vir itemontleding gebaseer op itemresponsteorie (IRT tydens toetskonstruksie gebruik word, is dit moontlik om itemkarakteristiekekrommes vir dieselfde item vir verskillende subgroepe voor te stel. Hierdie krommes dui aan hoe elke item vir die verskillende subgroepe op verskillende vermoënsvlakke te funksioneer. DIF word aangetoon deur die area tussen die krommes. DIF is in die konstruksie van die 'Learning Potential Computerised Adaptive test (LPCAT' gebruik om die items te identifiseer wat sydigheid ten opsigte van geslag, kultuur, taal of opleidingspeil geopenbaar het. Items wat ’n voorafbepaalde vlak van DIF oorskry het, is uit die finale itembank weggelaat, ongeag die subgroep wat bevoordeel of benadeel is. Die proses en resultate van die DIF-ontleding word bespreek.

  6. Fostering a student's skill for analyzing test items through an authentic task

    Science.gov (United States)

    Setiawan, Beni; Sabtiawan, Wahyu Budi

    2017-08-01

    Analyzing test items is a skill that must be mastered by prospective teachers, in order to determine the quality of test questions which have been written. The main aim of this research was to describe the effectiveness of authentic task to foster the student's skill for analyzing test items involving validity, reliability, item discrimination index, level of difficulty, and distractor functioning through the authentic task. The participant of the research is students of science education study program, science and mathematics faculty, Universitas Negeri Surabaya, enrolled for assessment course. The research design was a one-group posttest design. The treatment in this study is that the students were provided an authentic task facilitating the students to develop test items, then they analyze the items like a professional assessor using Microsoft Excel and Anates Software. The data of research obtained were analyzed descriptively, such as the analysis was presented by displaying the data of students' skill, then they were associated with theories or previous empirical studies. The research showed the task facilitated the students to have the skills. Thirty-one students got a perfect score for the analyzing, five students achieved 97% mastery, two students had 92% mastery, and another two students got 89% and 79% of mastery. The implication of the finding was the students who get authentic tasks forcing them to perform like a professional, the possibility of the students for achieving the professional skills will be higher at the end of learning.

  7. Using response-time constraints in item selection to control for differential speededness in computerized adaptive testing

    NARCIS (Netherlands)

    van der Linden, Willem J.; Scrams, David J.; Schnipke, Deborah L.

    2003-01-01

    This paper proposes an item selection algorithm that can be used to neutralize the effect of time limits in computer adaptive testing. The method is based on a statistical model for the response-time distributions of the test takers on the items in the pool that is updated each time a new item has

  8. A single-item global job satisfaction measure is associated with quantitative blood immune indices in white-collar employees.

    Science.gov (United States)

    Nakata, Akinori; Irie, Masahiro; Takahashi, Masaya

    2013-01-01

    Although a single-item job satisfaction measure has been shown to be reliable and inclusive as multiple-item scales in relation to health, studies including immunological data are few. The purpose of this study was to evaluate the validity of single-item job and family life satisfaction based on its association with immune indices. A total of 189 white-collar employees (70% men) underwent a blood draw for the measurement of natural killer (NK), total T, and B cell counts as well as plasma immunoglobulin (Ig) G concentrations and completed single-item job and family life satisfaction measures, respectively. The response options for satisfaction measures were 'dissatisfied' (coded 1) to 'satisfied' (coded 4). Spearman's partial correlations controlling for cofactors revealed that increased job satisfaction was positively associated with NK cells (rsp=0.201, p=0.007) and IgG (rsp=0.178, p=0.018), while family life satisfaction was unrelated to immune indices. Those who reported a combination of low job/low family life satisfaction had significantly lower NK and higher B cell counts than those with a high job/high family life satisfaction. Our study suggests that the single-item summary measure of job satisfaction, but not family life satisfaction, may be a valid tool to evaluate immune status in healthy white-collar employees.

  9. Overview of Classical Test Theory and Item Response Theory for Quantitative Assessment of Items in Developing Patient-Reported Outcome Measures

    Science.gov (United States)

    Cappelleri, Joseph C.; Lundy, J. Jason; Hays, Ron D.

    2014-01-01

    Introduction The U.S. Food and Drug Administration’s patient-reported outcome (PRO) guidance document defines content validity as “the extent to which the instrument measures the concept of interest” (FDA, 2009, p. 12). “Construct validity is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity” (Strauss & Smith, 2009, p. 7). Hence both qualitative and quantitative information are essential in evaluating the validity of measures. Methods We review classical test theory and item response theory approaches to evaluating PRO measures including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized “difficulty” (severity) order of items is represented by observed responses. Conclusion Classical test theory and item response theory can be useful in providing a quantitative assessment of items and scales during the content validity phase of patient-reported outcome measures. Depending on the particular type of measure and the specific circumstances, either one or both approaches should be considered to help maximize the content validity of PRO measures. PMID:24811753

  10. Science Literacy: How do High School Students Solve PISA Test Items?

    Science.gov (United States)

    Wati, F.; Sinaga, P.; Priyandoko, D.

    2017-09-01

    The Programme for International Students Assessment (PISA) does assess students’ science literacy in a real-life contexts and wide variety of situation. Therefore, the results do not provide adequate information for the teacher to excavate students’ science literacy because the range of materials taught at schools depends on the curriculum used. This study aims to investigate the way how junior high school students in Indonesia solve PISA test items. Data was collected by using PISA test items in greenhouse unit employed to 36 students of 9th grade. Students’ answer was analyzed qualitatively for each item based on competence tested in the problem. The way how students answer the problem exhibits their ability in particular competence which is influenced by a number of factors. Those are students’ unfamiliarity with test construction, low performance on reading, low in connecting available information and question, and limitation on expressing their ideas effectively and easy-read. As the effort, selected PISA test items can be used in accordance teaching topic taught to familiarize students with science literacy.

  11. The effect of heightened awareness of observation on consumption of a multi-item laboratory test meal in females.

    Science.gov (United States)

    Robinson, Eric; Proctor, Michael; Oldham, Melissa; Masic, Una

    2016-09-01

    Human eating behaviour is often studied in the laboratory, but whether the extent to which a participant believes that their food intake is being measured influences consumption of different meal items is unclear. Our main objective was to examine whether heightened awareness of observation of food intake affects consumption of different food items during a lunchtime meal. One hundred and fourteen female participants were randomly assigned to an experimental condition designed to heighten participant awareness of observation or a condition in which awareness of observation was lower, before consuming an ad libitum multi-item lunchtime meal in a single session study. Under conditions of heightened awareness, participants tended to eat less of an energy dense snack food (cookies) in comparison to the less aware condition. Consumption of other meal items and total energy intake were similar in the heightened awareness vs. less aware condition. Exploratory secondary analyses suggested that the effect heightened awareness had on reduced cookie consumption was dependent on weight status, as well as trait measures of dietary restraint and disinhibition, whereby only participants with overweight/obesity, high disinhibition or low restraint reduced their cookie consumption. Heightened awareness of observation may cause females to reduce their consumption of an energy dense snack food during a test meal in the laboratory and this effect may be moderated by participant individual differences. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  12. The validity of the Satisfaction with Life Scale in adolescents and a comparison with single-item life satisfaction measures: a preliminary study.

    Science.gov (United States)

    Jovanović, Veljko

    2016-12-01

    The validity of the life satisfaction measures commonly used among adults has been rarely examined in adolescent samples. The present research had two main goals: (1) to evaluate the structural validity of the Satisfaction with Life Scale (SWLS) among adolescents and to test measurement invariance across gender; (2) to compare the criterion and convergent validity of the SWLS and single-item life satisfaction measures among adolescents. Three samples of Serbian adolescents were recruited for the present research. Study 1 (N = 481, M age  = 17.01 years) examined the structure of the SWLS via confirmatory factor analysis (CFA) and evaluated measurement invariance of the SWLS across gender by a multi-group CFA. Study 2 (N = 283, M age  = 17.34 years) and Study 3 (N = 220, M age  = 16.73 years) compared the convergent validity of the SWLS and single-item life satisfaction measures. The results of Study 1 supported the original one-factor model of the SWLS among adolescents and provided evidence for strong measurement invariance of the SWLS across gender. The findings of Study 2 and Study 3 showed that the SWLS and single-item measures were equally valid and strongly associated (r = .734 in Study 2 and r = .668 in Study 3). No substantial differences in correlations with school success and well-being indicators were found between the SWLS and single-item measures. Our findings support the use of the SWLS among adolescents and indicate that single-item life satisfaction measures perform as well as the SWLS in adolescent samples.

  13. Nursing Faculty Decision Making about Best Practices in Test Construction, Item Analysis, and Revision

    Science.gov (United States)

    Killingsworth, Erin Elizabeth

    2013-01-01

    With the widespread use of classroom exams in nursing education there is a great need for research on current practices in nursing education regarding this form of assessment. The purpose of this study was to explore how nursing faculty members make decisions about using best practices in classroom test construction, item analysis, and revision in…

  14. Sensitivity and specificity of the 3-item memory test in the assessment of post traumatic amnesia.

    NARCIS (Netherlands)

    Andriessen, T.M.J.C.; Jong, B. de; Jacobs, B.; Werf, S.P. van der; Vos, P.E.

    2009-01-01

    PRIMARY OBJECTIVE: To investigate how the type of stimulus (pictures or words) and the method of reproduction (free recall or recognition after a short or a long delay) affect the sensitivity and specificity of a 3-item memory test in the assessment of post traumatic amnesia (PTA). METHODS: Daily

  15. Development of Abbreviated Nine-Item Forms of the Raven's Standard Progressive Matrices Test

    Science.gov (United States)

    Bilker, Warren B.; Hansen, John A.; Brensinger, Colleen M.; Richard, Jan; Gur, Raquel E.; Gur, Ruben C.

    2012-01-01

    The Raven's Standard Progressive Matrices (RSPM) is a 60-item test for measuring abstract reasoning, considered a nonverbal estimate of fluid intelligence, and often included in clinical assessment batteries and research on patients with cognitive deficits. The goal was to develop and apply a predictive model approach to reduce the number of items…

  16. Reading ability and print exposure: item response theory analysis of the author recognition test.

    Science.gov (United States)

    Moore, Mariah; Gordon, Peter C

    2015-12-01

    In the author recognition test (ART), participants are presented with a series of names and foils and are asked to indicate which ones they recognize as authors. The test is a strong predictor of reading skill, and this predictive ability is generally explained as occurring because author knowledge is likely acquired through reading or other forms of print exposure. In this large-scale study (1,012 college student participants), we used item response theory (IRT) to analyze item (author) characteristics in order to facilitate identification of the determinants of item difficulty, provide a basis for further test development, and optimize scoring of the ART. Factor analysis suggested a potential two-factor structure of the ART, differentiating between literary and popular authors. Effective and ineffective author names were identified so as to facilitate future revisions of the ART. Analyses showed that the ART is a highly significant predictor of the time spent encoding words, as measured using eyetracking during reading. The relationship between the ART and time spent reading provided a basis for implementing a higher penalty for selecting foils, rather than the standard method of ART scoring (names selected minus foils selected). The findings provide novel support for the view that the ART is a valid indicator of reading volume. Furthermore, they show that frequency data can be used to select items of appropriate difficulty, and that frequency data from corpora based on particular time periods and types of texts may allow adaptations of the test for different populations.

  17. A Method for the Comparison of Item Selection Rules in Computerized Adaptive Testing

    Science.gov (United States)

    Barrada, Juan Ramon; Olea, Julio; Ponsoda, Vicente; Abad, Francisco Jose

    2010-01-01

    In a typical study comparing the relative efficiency of two item selection rules in computerized adaptive testing, the common result is that they simultaneously differ in accuracy and security, making it difficult to reach a conclusion on which is the more appropriate rule. This study proposes a strategy to conduct a global comparison of two or…

  18. Easy and Informative: Using Confidence-Weighted True-False Items for Knowledge Tests in Psychology Courses

    Science.gov (United States)

    Dutke, Stephan; Barenberg, Jonathan

    2015-01-01

    We introduce a specific type of item for knowledge tests, confidence-weighted true-false (CTF) items, and review experiences of its application in psychology courses. A CTF item is a statement about the learning content to which students respond whether the statement is true or false, and they rate their confidence level. Previous studies using…

  19. Single-event effect ground test issues

    International Nuclear Information System (INIS)

    Koga, R.

    1996-01-01

    Ground-based single event effect (SEE) testing of microcircuits permits characterization of device susceptibility to various radiation induced disturbances, including: (1) single event upset (SEU) and single event latchup (SEL) in digital microcircuits; (2) single event gate rupture (SEGR), and single event burnout (SEB) in power transistors; and (3) bit errors in photonic devices. These characterizations can then be used to generate predictions of device performance in the space radiation environment. This paper provides a general overview of ground-based SEE testing and examines in critical depth several underlying conceptual constructs relevant to the conduct of such tests and to the proper interpretation of results. These more traditional issues are contrasted with emerging concerns related to the testing of modern, advanced microcircuits

  20. The Single Item Literacy Screener: Evaluation of a brief instrument to identify limited reading ability

    Directory of Open Access Journals (Sweden)

    Chew Lisa D

    2006-03-01

    Full Text Available Abstract Background Reading skills are important for accessing health information, using health care services, managing one's health and achieving desirable health outcomes. Our objective was to assess the diagnostic accuracy of the Single Item Literacy Screener (SILS to identify limited reading ability, one component of health literacy, as measured by the S-TOFHLA. Methods Cross-sectional interview with 999 adults with diabetes residing in Vermont and bordering states. Participants were randomly recruited from Primary Care practices in the Vermont Diabetes Information System June 2003 – December 2004. The main outcome was limited reading ability. The primary predictor was the SILS. Results Of the 999 persons screened, 169 (17% had limited reading ability. The sensitivity of the SILS in detecting limited reading ability was 54% [95% CI: 47%, 61%] and the specificity was 83% [95% CI: 81%, 86%] with an area under the Receiver Operating Characteristics Curve (ROC of 0.73 [95% CI: 0.69, 0.78]. Seven hundred seventy (77% screened negative on the SILS and 692 of these subjects had adequate reading skills (negative predictive value = 0.90 [95% CI: 0.88, 0.92]. Of the 229 who scored positive on the SILS, 92 had limited reading ability (positive predictive value = 0.4 [95% CI: 0.34, 0.47]. Conclusion The SILS is a simple instrument designed to identify patients with limited reading ability who need help reading health-related materials. The SILS performs moderately well at ruling out limited reading ability in adults and allows providers to target additional assessment of health literacy skills to those most in need. Further study of the use of the SILS in clinical settings and with more diverse populations is warranted.

  1. The Dysexecutive Questionnaire advanced: item and test score characteristics, 4-factor solution, and severity classification.

    Science.gov (United States)

    Bodenburg, Sebastian; Dopslaff, Nina

    2008-01-01

    The Dysexecutive Questionnaire (DEX, , Behavioral assessment of the dysexecutive syndrome, 1996) is a standardized instrument to measure possible behavioral changes as a result of the dysexecutive syndrome. Although initially intended only as a qualitative instrument, the DEX has also been used increasingly to address quantitative problems. Until now there have not been more fundamental statistical analyses of the questionnaire's testing quality. The present study is based on an unselected sample of 191 patients with acquired brain injury and reports on the data relating to the quality of the items, the reliability and the factorial structure of the DEX. Item 3 displayed too great an item difficulty, whereas item 11 was not sufficiently discriminating. The DEX's reliability in self-rating is r = 0.85. In addition to presenting the statistical values of the tests, a clinical severity classification of the overall scores of the 4 found factors and of the questionnaire as a whole is carried out on the basis of quartile standards.

  2. Prediction of true test scores from observed item scores and ancillary data.

    Science.gov (United States)

    Haberman, Shelby J; Yao, Lili; Sinharay, Sandip

    2015-05-01

    In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE(®) General Analytical Writing and until 2009 in the case of TOEFL(®) iBT Writing. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e-rater(®). In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE General Analytical Writing and TOEFL iBT Writing. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability. © 2015 The British Psychological Society.

  3. Sensitivity and specificity of the 3-item memory test in the assessment of post traumatic amnesia.

    Science.gov (United States)

    Andriessen, Teuntje M J C; de Jong, Ben; Jacobs, Bram; van der Werf, Sieberen P; Vos, Pieter E

    2009-04-01

    To investigate how the type of stimulus (pictures or words) and the method of reproduction (free recall or recognition after a short or a long delay) affect the sensitivity and specificity of a 3-item memory test in the assessment of post traumatic amnesia (PTA). Daily testing was performed in 64 consecutively admitted traumatic brain injured patients, 22 orthopedically injured patients and 26 healthy controls until criteria for resolution of PTA were reached. Subjects were randomly assigned to a test with visual or verbal stimuli. Short delay reproduction was tested after an interval of 3-5 minutes, long delay reproduction was tested after 24 hours. Sensitivity and specificity were calculated over the first 4 test days. The 3-word test showed higher sensitivity than the 3-picture test, while specificity of the two tests was equally high. Free recall was a more effortful task than recognition for both patients and controls. In patients, a longer delay between registration and recall resulted in a significant decrease in the number of items reproduced. Presence of PTA is best assessed with a memory test that incorporates the free recall of words after a long delay.

  4. Robustness of two single-item self-esteem measures: cross-validation with a measure of stigma in a sample of psychiatric patients.

    Science.gov (United States)

    Bagley, Christopher

    2005-08-01

    Robins' Single-item Self-esteem Inventory was compared with a single item from the Coopersmith Self-esteem. Although a new scoring format was used, there was good evidence of cross-validation in 83 current and former psychiatric patients who completed Harvey's adapted measure of stigma felt and experienced by users of mental health services. Scores on the two single-item self-esteem measures correlated .76 (p self-esteem in users of mental health services.

  5. Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures.

    Science.gov (United States)

    Cappelleri, Joseph C; Jason Lundy, J; Hays, Ron D

    2014-05-01

    The US Food and Drug Administration's guidance for industry document on patient-reported outcomes (PRO) defines content validity as "the extent to which the instrument measures the concept of interest" (FDA, 2009, p. 12). According to Strauss and Smith (2009), construct validity "is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity" (p. 7). Hence, both qualitative and quantitative information are essential in evaluating the validity of measures. We review classical test theory and item response theory (IRT) approaches to evaluating PRO measures, including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized "difficulty" (severity) order of items is represented by observed responses. If a researcher has few qualitative data and wants to get preliminary information about the content validity of the instrument, then descriptive assessments using classical test theory should be the first step. As the sample size grows during subsequent stages of instrument development, confidence in the numerical estimates from Rasch and other IRT models (as well as those of classical test theory) would also grow. Classical test theory and IRT can be useful in providing a quantitative assessment of items and scales during the content-validity phase of PRO-measure development. Depending on the particular type of measure and the specific circumstances, the classical test theory and/or the IRT should be considered to help maximize the content validity of PRO measures. Copyright © 2014 Elsevier HS Journals, Inc. All rights reserved.

  6. Item analysis of the Spanish version of the Boston Naming Test with a Spanish speaking adult population from Colombia.

    Science.gov (United States)

    Kim, Stella H; Strutt, Adriana M; Olabarrieta-Landa, Laiene; Lequerica, Anthony H; Rivera, Diego; De Los Reyes Aragon, Carlos Jose; Utria, Oscar; Arango-Lasprilla, Juan Carlos

    2018-02-23

    The Boston Naming Test (BNT) is a widely used measure of confrontation naming ability that has been criticized for its questionable construct validity for non-English speakers. This study investigated item difficulty and construct validity of the Spanish version of the BNT to assess cultural and linguistic impact on performance. Subjects were 1298 healthy Spanish speaking adults from Colombia. They were administered the 60- and 15-item Spanish version of the BNT. A Rasch analysis was computed to assess dimensionality, item hierarchy, targeting, reliability, and item fit. Both versions of the BNT satisfied requirements for unidimensionality. Although internal consistency was excellent for the 60-item BNT, order of difficulty did not increase consistently with item number and there were a number of items that did not fit the Rasch model. For the 15-item BNT, a total of 5 items changed position on the item hierarchy with 7 poor fitting items. Internal consistency was acceptable. Construct validity of the BNT remains a concern when it is administered to non-English speaking populations. Similar to previous findings, the order of item presentation did not correspond with increasing item difficulty, and both versions were inadequate at assessing high naming ability.

  7. Why Students Answer TIMSS Science Test Items the Way They Do

    Science.gov (United States)

    Harlow, Ann; Jones, Alister

    2004-04-01

    The purpose of this study was to explore how Year 8 students answered Third International Mathematics and Science Study (TIMSS) questions and whether the test questions represented the scientific understanding of these students. One hundred and seventy-seven students were tested using written test questions taken from the science test used in the Third International Mathematics and Science Study. The degree to which a sample of 38 children represented their understanding of the topics in a written test compared to the level of understanding that could be elicited by an interview is presented in this paper. In exploring student responses in the interview situation this study hoped to gain some insight into the science knowledge that students held and whether or not the test items had been able to elicit this knowledge successfully. We question the usefulness and quality of data from large-scale summative assessments on their own to represent student scientific understanding and conclude that large scale written test items, such as TIMSS, on their own are not a valid way of exploring students'' understanding of scientific concepts. Considerable caution is therefore needed in exploiting the outcomes of international achievement testing when considering educational policy changes or using TIMSS data on their own to represent student understanding.

  8. Redefining diagnostic symptoms of depression using Rasch analysis: testing an item bank suitable for DSM-V and computer adaptive testing.

    Science.gov (United States)

    Mitchell, Alex J; Smith, Adam B; Al-salihy, Zerak; Rahim, Twana A; Mahmud, Mahmud Q; Muhyaldin, Asma S

    2011-10-01

    We aimed to redefine the optimal self-report symptoms of depression suitable for creation of an item bank that could be used in computer adaptive testing or to develop a simplified screening tool for DSM-V. Four hundred subjects (200 patients with primary depression and 200 non-depressed subjects), living in Iraqi Kurdistan were interviewed. The Mini International Neuropsychiatric Interview (MINI) was used to define the presence of major depression (DSM-IV criteria). We examined symptoms of depression using four well-known scales delivered in Kurdish. The Partial Credit Model was applied to each instrument. Common-item equating was subsequently used to create an item bank and differential item functioning (DIF) explored for known subgroups. A symptom level Rasch analysis reduced the original 45 items to 24 items of the original after the exclusion of 21 misfitting items. A further six items (CESD13 and CESD17, HADS-D4, HADS-D5 and HADS-D7, and CDSS3 and CDSS4) were removed due to misfit as the items were added together to form the item bank, and two items were subsequently removed following the DIF analysis by diagnosis (CESD20 and CDSS9, both of which were harder to endorse for women). Therefore the remaining optimal item bank consisted of 17 items and produced an area under the curve (AUC) of 0.987. Using a bank restricted to the optimal nine items revealed only minor loss of accuracy (AUC = 0.989, sensitivity 96%, specificity 95%). Finally, when restricted to only four items accuracy was still high (AUC was still 0.976; sensitivity 93%, specificity 96%). An item bank of 17 items may be useful in computer adaptive testing and nine or even four items may be used to develop a simplified screening tool for DSM-V major depressive disorder (MDD). Further examination of this item bank should be conducted in different cultural settings.

  9. On the Relationship between Classical Test Theory and Item Response Theory: From One to the Other and Back

    Science.gov (United States)

    Raykov, Tenko; Marcoulides, George A.

    2016-01-01

    The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete…

  10. Item analysis of single-peaked response data : the psychometric evaluation of bipolar measurement scales

    NARCIS (Netherlands)

    Polak, Maaike Geertruida

    2011-01-01

    The thesis explains the fundamental difference between unipolar and bipolar measurement scales for psychological characteristics. We explore the use of correspondence analysis (CA), a technique that is similar to principal component analysis and is available in SAS and SPSS, to select items that

  11. Using Classical Test Theory and Item Response Theory to Evaluate the LSCI

    Science.gov (United States)

    Schlingman, Wayne M.; Prather, E. E.; Collaboration of Astronomy Teaching Scholars CATS

    2011-01-01

    Analyzing the data from the recent national study using the Light and Spectroscopy Concept Inventory (LSCI), this project uses both Classical Test Theory (CTT) and Item Response Theory (IRT) to investigate the LSCI itself in order to better understand what it is actually measuring. We use Classical Test Theory to form a framework of results that can be used to evaluate the effectiveness of individual questions at measuring differences in student understanding and provide further insight into the prior results presented from this data set. In the second phase of this research, we use Item Response Theory to form a theoretical model that generates parameters accounting for a student's ability, a question's difficulty, and estimate the level of guessing. The combined results from our investigations using both CTT and IRT are used to better understand the learning that is taking place in classrooms across the country. The analysis will also allow us to evaluate the effectiveness of individual questions and determine whether the item difficulties are appropriately matched to the abilities of the students in our data set. These results may require that some questions be revised, motivating the need for further development of the LSCI. This material is based upon work supported by the National Science Foundation under Grant No. 0715517, a CCLI Phase III Grant for the Collaboration of Astronomy Teaching Scholars (CATS). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

  12. Science Library of Test Items. Volume Nine. Mastery Testing Programme. [Mastery Tests Series 1.] Tests M1-M13.

    Science.gov (United States)

    New South Wales Dept. of Education, Sydney (Australia).

    As part of a series of tests to measure mastery of specific skills in the natural sciences, copies of the first 13 tests are provided. Skills to be tested include: (1) reading a table; (2) using a biological key; (3) identifying chemical symbols; (4) identifying parts of a human body; (5) reading a line graph; (6) identifying electronic and…

  13. The Linear Logistic Test Model (LLTM as the methodological foundation of item generating rules for a new verbal reasoning test

    Directory of Open Access Journals (Sweden)

    HERBERT POINSTINGL

    2009-06-01

    Full Text Available Based on the demand for new verbal reasoning tests to enrich psychological test inventory, a pilot version of a new test was analysed: the 'Family Relation Reasoning Test' (FRRT; Poinstingl, Kubinger, Skoda & Schechtner, forthcoming, in which several basic cognitive operations (logical rules have been embedded/implemented. Given family relationships of varying complexity embedded in short stories, testees had to logically conclude the correct relationship between two individuals within a family. Using empirical data, the linear logistic test model (LLTM; Fischer, 1972, a special case of the Rasch model, was used to test the construct validity of the test: The hypothetically assumed basic cognitive operations had to explain the Rasch model's item difficulty parameters. After being shaped in LLTM's matrices of weights ((qij, none of these operations were corroborated by means of the Andersen's Likelihood Ratio Test.

  14. Science Library of Test Items. Volume Three. Mastery Testing Programme. Introduction and Manual.

    Science.gov (United States)

    New South Wales Dept. of Education, Sydney (Australia).

    A set of short tests aimed at measuring student mastery of specific skills in the natural sciences are presented with a description of the mastery program's purposes, development, and methods. Mastery learning, criterion-referenced testing, and the scope of skills to be tested are defined. Each of the multiple choice tests for grades 7 through 10…

  15. RT-based memory detection : Item saliency effects in the single-probe and the multiple-probe protocol

    NARCIS (Netherlands)

    Verschuere, B.; Kleinberg, B.; Theocharidou, K.

    RT-based memory detection may provide an efficient means to assess recognition of concealed information. There is, however, considerable heterogeneity in detection rates, and we explored two potential moderators: item saliency and test protocol. Participants tried to conceal low salient (e.g.,

  16. Evaluation of a single-item screening question to detect limited health literacy in peritoneal dialysis patients.

    Science.gov (United States)

    Jain, Deepika; Sheth, Heena; Bender, Filitsa H; Weisbord, Steven D; Green, Jamie A

    2014-01-01

    Studies have shown that a single-item question might be useful in identifying patients with limited health literacy. However, the utility of the approach has not been studied in patients receiving maintenance peritoneal dialysis (PD). We assessed health literacy in a cohort of 31 PD patients by administering the Rapid Estimate of Adult Literacy in Medicine (REALM) and a single-item health literacy (SHL) screening question "How confident are you filling out medical forms by yourself?" (Extremely, Quite a bit, Somewhat, A little bit, or Not at all). To determine the accuracy of the single-item question for detecting limited health literacy, we performed sensitivity and specificity analyses of the SHL and plotted the area under the receiver operating characteristic (AUROC) curve using the REALM as a reference standard. Using a cut-off of "Somewhat" or less confident, the sensitivity of the SHL for detecting limited health literacy was 80%, and the specificity was 88%. The positive likelihood ratio was 6.9. The SHL had an AUROC of 0.79 (95% confidence interval: 0.52 to 1.00). Our results show that the SHL could be effective in detecting limited health literacy in PD patients.

  17. Science Library of Test Items. Volume Ten. Mastery Testing Programme. [Mastery Tests Series 2.] Tests M14-M26.

    Science.gov (United States)

    New South Wales Dept. of Education, Sydney (Australia).

    As part of a series of tests to measure mastery of specific skills in the natural sciences, copies of tests 14 through 26 include: (14) calculating an average; (15) identifying parts of the scientific method; (16) reading a geological map; (17) identifying elements, mixtures and compounds; (18) using Ohm's law in calculation; (19) interpreting…

  18. Science Library of Test Items. Volume Twelve. Mastery Testing Programme. [Mastery Tests Series 4.] Tests M39-M50.

    Science.gov (United States)

    New South Wales Dept. of Education, Sydney (Australia).

    As part of a series of tests to measure mastery of specific skills in the natural sciences, copies of tests 39 through 50 include: (39) using a code; (40) naming the parts of a microscope; (41) calculating density and predicting flotation; (42) estimating metric length; (43) using SI symbols; (44) using s=vt; (45) applying a novel theory; (46)…

  19. Science Library of Test Items. Volume Thirteen. Mastery Testing Program. [Mastery Tests Series 5.] Tests M51-M65.

    Science.gov (United States)

    New South Wales Dept. of Education, Sydney (Australia).

    As part of a series of tests to measure mastery of specific skills in the natural sciences, copies of tests 51 through 65 include: (51) interpreting atomic and mass numbers; (52) extrapolating from a geological map; (53) matching geological sections and maps; (54) identifying parts of the human eye; (55) identifying the functions of parts of a…

  20. Science Library of Test Items. Volume Eleven. Mastery Testing Programme. [Mastery Tests Series 3.] Tests M27-M38.

    Science.gov (United States)

    New South Wales Dept. of Education, Sydney (Australia).

    As part of a series of tests to measure mastery of specific skills in the natural sciences, copies of tests 27 through 38 include: (27) reading a grid plan; (28) identifying common invertebrates; (29) characteristics of invertebrates; (30) identifying elements; (31) using scientific notation part I; (32) classifying minerals; (33) predicting the…

  1. Test-retest reliability of selected items of Health Behaviour in School-aged Children (HBSC survey questionnaire in Beijing, China

    Directory of Open Access Journals (Sweden)

    Liu Yang

    2010-08-01

    Full Text Available Abstract Background Children's health and health behaviour are essential for their development and it is important to obtain abundant and accurate information to understand young people's health and health behaviour. The Health Behaviour in School-aged Children (HBSC study is among the first large-scale international surveys on adolescent health through self-report questionnaires. So far, more than 40 countries in Europe and North America have been involved in the HBSC study. The purpose of this study is to assess the test-retest reliability of selected items in the Chinese version of the HBSC survey questionnaire in a sample of adolescents in Beijing, China. Methods A sample of 95 male and female students aged 11 or 15 years old participated in a test and retest with a three weeks interval. Student Identity numbers of respondents were utilized to permit matching of test-retest questionnaires. 23 items concerning physical activity, sedentary behaviour, sleep and substance use were evaluated by using the percentage of response shifts and the single measure Intraclass Correlation Coefficients (ICC with 95% confidence interval (CI for all respondents and stratified by gender and age. Items on substance use were only evaluated for school children aged 15 years old. Results The percentage of no response shift between test and retest varied from 32% for the item on computer use at weekends to 92% for the three items on smoking. Of all the 23 items evaluated, 6 items (26% showed a moderate reliability, 12 items (52% displayed a substantial reliability and 4 items (17% indicated almost perfect reliability. No gender and age group difference of the test-retest reliability was found except for a few items on sedentary behaviour. Conclusions The overall findings of this study suggest that most selected indicators in the HBSC survey questionnaire have satisfactory test-retest reliability for the students in Beijing. Further test-retest studies in a large

  2. A Comparison of the Approaches of Generalizability Theory and Item Response Theory in Estimating the Reliability of Test Scores for Testlet-Composed Tests

    Science.gov (United States)

    Lee, Guemin; Park, In-Yong

    2012-01-01

    Previous assessments of the reliability of test scores for testlet-composed tests have indicated that item-based estimation methods overestimate reliability. This study was designed to address issues related to the extent to which item-based estimation methods overestimate the reliability of test scores composed of testlets and to compare several…

  3. Applications of Multidimensional Item Response Theory Models with Covariates to Longitudinal Test Data. Research Report. ETS RR-16-21

    Science.gov (United States)

    Fu, Jianbin

    2016-01-01

    The multidimensional item response theory (MIRT) models with covariates proposed by Haberman and implemented in the "mirt" program provide a flexible way to analyze data based on item response theory. In this report, we discuss applications of the MIRT models with covariates to longitudinal test data to measure skill differences at the…

  4. Evaluating the validity of the Work Role Functioning Questionnaire (Canadian French version) using classical test theory and item response theory.

    Science.gov (United States)

    Hong, Quan Nha; Coutu, Marie-France; Berbiche, Djamal

    2017-01-01

    The Work Role Functioning Questionnaire (WRFQ) was developed to assess workers' perceived ability to perform job demands and is used to monitor presenteeism. Still few studies on its validity can be found in the literature. The purpose of this study was to assess the items and factorial composition of the Canadian French version of the WRFQ (WRFQ-CF). Two measurement approaches were used to test the WRFQ-CF: Classical Test Theory (CTT) and non-parametric Item Response Theory (IRT). A total of 352 completed questionnaires were analyzed. A four-factor and three-factor model models were tested and shown respectively good fit with 14 items (Root Mean Square Error of Approximation (RMSEA) = 0.06, Standardized Root Mean Square Residual (SRMR) = 0.04, Bentler Comparative Fit Index (CFI) = 0.98) and with 17 items (RMSEA = 0.059, SRMR = 0.048, CFI = 0.98). Using IRT, 13 problematic items were identified, of which 9 were common with CTT. This study tested different models with fewer problematic items found in a three-factor model. Using a non-parametric IRT and CTT for item purification gave complementary results. IRT is still scarcely used and can be an interesting alternative method to enhance the quality of a measurement instrument. More studies are needed on the WRFQ-CF to refine its items and factorial composition.

  5. Using Cochran's Z Statistic to Test the Kernel-Smoothed Item Response Function Differences between Focal and Reference Groups

    Science.gov (United States)

    Zheng, Yinggan; Gierl, Mark J.; Cui, Ying

    2010-01-01

    This study combined the kernel smoothing procedure and a nonparametric differential item functioning statistic--Cochran's Z--to statistically test the difference between the kernel-smoothed item response functions for reference and focal groups. Simulation studies were conducted to investigate the Type I error and power of the proposed…

  6. A unified factor-analytic approach to the detection of item and test bias: Illustration with the effect of providing calculators to students with dyscalculia

    Directory of Open Access Journals (Sweden)

    Lee, M. K.

    2016-01-01

    Full Text Available An absence of measurement bias against distinct groups is a prerequisite for the use of a given psychological instrument in scientific research or high-stakes assessment. Factor analysis is the framework explicitly adopted for the identification of such bias when the instrument consists of a multi-test battery, whereas item response theory is employed when the focus narrows to a single test composed of discrete items. Item response theory can be treated as a mild nonlinearization of the standard factor model, and thus the essential unity of bias detection at the two levels merits greater recognition. Here we illustrate the benefits of a unified approach with a real-data example, which comes from a statewide test of mathematics achievement where examinees diagnosed with dyscalculia were accommodated with calculators. We found that items that can be solved by explicit arithmetical computation became easier for the accommodated examinees, but the quantitative magnitude of this differential item functioning (measurement bias was small.

  7. Adaptation and validation into Portuguese language of the six-item cognitive impairment test (6CIT).

    Science.gov (United States)

    Apóstolo, João Luís Alves; Paiva, Diana Dos Santos; Silva, Rosa Carla Gomes da; Santos, Eduardo José Ferreira Dos; Schultz, Timothy John

    2017-07-25

    The six-item cognitive impairment test (6CIT) is a brief cognitive screening tool that can be administered to older people in 2-3 min. To adapt the 6CIT for the European Portuguese and determine its psychometric properties based on a sample recruited from several contexts (nursing homes; universities for older people; day centres; primary health care units). The original 6CIT was translated into Portuguese and the draft Portuguese version (6CIT-P) was back-translated and piloted. The accuracy of the 6CIT-P was assessed by comparison with the Portuguese Mini-Mental State Examination (MMSE). A convenience sample of 550 older people from various geographical locations in the north and centre of the country was used. The test-retest reliability coefficient was high (r = 0.95). The 6CIT-P also showed good internal consistency (α = 0.88) and corrected item-total correlations ranged between 0.32 and 0.90. Total 6CIT-P and MMSE scores were strongly correlated. The proposed 6CIT-P threshold for cognitive impairment is ≥10 in the Portuguese population, which gives sensitivity of 82.78% and specificity of 84.84%. The accuracy of 6CIT-P, as measured by area under the ROC curve, was 0.91. The 6CIT-P has high reliability and validity and is accurate when used to screen for cognitive impairment.

  8. Analyzing Item Generation with Natural Language Processing Tools for the "TOEIC"® Listening Test. Research Report. ETS RR-17-52

    Science.gov (United States)

    Yoon, Su-Youn; Lee, Chong Min; Houghton, Patrick; Lopez, Melissa; Sakano, Jennifer; Loukina, Anastasia; Krovetz, Bob; Lu, Chi; Madani, Nitin

    2017-01-01

    In this study, we developed assistive tools and resources to support TOEIC® Listening test item generation. There has recently been an increased need for a large pool of items for these tests. This need has, in turn, inspired efforts to increase the efficiency of item generation while maintaining the quality of the created items. We aimed to…

  9. Ansys Benchmark of the Single Heater Test

    International Nuclear Information System (INIS)

    H.M. Wade; H. Marr; M.J. Anderson

    2006-01-01

    The Single Heater Test (SHT) is the first of three in-situ thermal tests included in the site characterization program for the potential nuclear waste monitored geologic repository at Yucca Mountain. The heating phase of the SHT started in August 1996 and was concluded in May 1997 after 9 months of heating. Cooling continued until January 1998, at which time post-test characterization of the test block commenced. Numerous thermal, hydrological, mechanical, and chemical sensors monitored the coupled processes in the unsaturated fractured rock mass around the heater (CRWMS M and O 1999). The objective of this calculation is to benchmark a numerical simulation of the rock mass thermal behavior against the extensive data set that is available from the thermal test. The scope is limited to three-dimensional (3-D) numerical simulations of the computational domain of the Single Heater Test and surrounding rock mass. This calculation supports the waste package thermal design methodology, and is developed by Waste Package Department (WPD) under Office of Civilian Radioactive Waste Management (OCRWM) procedure AP-3.12Q, Revision 0, ICN 3, BSCN 1, Calculations

  10. A mathematical model for order splitting in a multiple supplier single-item inventory system

    DEFF Research Database (Denmark)

    Abginehchi, Soheil; Farahani, Reza Zanjirani; Rezapour, Shabnam

    2013-01-01

    systems. The item acquisition lead times of suppliers are random variables. Backorder is allowed and shortage cost is charged based on not only per unit in shortage but also per time unit. Continuous review (s,Q) policy has been assumed. When the inventory level depletes to a reorder level, the total...... order is split among n suppliers. Since the suppliers have different characteristics, the quantity ordered to different suppliers may be different. The problem is to determine the reorder level and quantity ordered to each supplier so that the expected total cost per time unit, including ordering cost......, procurement cost, inventory holding cost, and shortage cost, is minimized. We also conduct extensive numerical experiments to show the advantages of our model compared with the models in the literature. According to our extensive experiments, the model developed in this paper is the best model...

  11. TINGKAT PERSEDIAAN SPARE PART FORKLIFT MEREK KOMATSU DENGAN PENDEKATAN MODEL PERSEDIAAN SINGLE ITEM

    Directory of Open Access Journals (Sweden)

    Wahid Ahmad Jauhari

    2006-04-01

    Full Text Available The control and maintenance of inventories is a problem common to all enterprises in any sector of a given economy. Two fundamental question that must be answered in controlling the inventory are when to replenish the inventory and how much to order for replenishment. The (Q,r inventory models attempt to answer the two question under a variety of circumstances. Studies have shown, (1 that a company that ignores lead-time demand variability may suffer great financial damage, (2 that the gamma distribution provides the most common best fit to lead-time demand for variety of inventories items, (3 that a fixed lead-time demand assumption or a normal approximation to it will often yield significant errors (Namit and Chen, 1998.This research performed an efficient and accurate algorithm for solving (Q,r inventory model with gamma lead-time demand.

  12. Enactment versus observation: item-specific and relational processing in goal-directed action sequences (and lists of single actions.

    Directory of Open Access Journals (Sweden)

    Janette Schult

    Full Text Available What are the memory-related consequences of learning actions (such as "apply the patch" by enactment during study, as compared to action observation? Theories converge in postulating that enactment encoding increases item-specific processing, but not the processing of relational information. Typically, in the laboratory enactment encoding is studied for lists of unrelated single actions in which one action execution has no overarching purpose or relation with other actions. In contrast, real-life actions are usually carried out with the intention to achieve such a purpose. When actions are embedded in action sequences, relational information provides efficient retrieval cues. We contrasted memory for single actions with memory for action sequences in three experiments. We found more reliance on relational processing for action-sequences than single actions. To what degree can this relational information be used after enactment versus after the observation of an actor? We found indicators of superior relational processing after observation than enactment in ordered pair recall (Experiment 1A and in emerging subjective organization of repeated recall protocols (recall runs 2-3, Experiment 2. An indicator of superior item-specific processing after enactment compared to observation was recognition (Experiment 1B, Experiment 2. Similar net recall suggests that observation can be as good a learning strategy as enactment. We discuss possible reasons why these findings only partly converge with previous research and theorizing.

  13. Location Indices for Ordinal Polytomous Items Based on Item Response Theory. Research Report. ETS RR-15-20

    Science.gov (United States)

    Ali, Usama S.; Chang, Hua-Hua; Anderson, Carolyn J.

    2015-01-01

    Polytomous items are typically described by multiple category-related parameters; situations, however, arise in which a single index is needed to describe an item's location along a latent trait continuum. Situations in which a single index would be needed include item selection in computerized adaptive testing or test assembly. Therefore single…

  14. Performance on large-scale science tests: Item attributes that may impact achievement scores

    Science.gov (United States)

    Gordon, Janet Victoria

    , characteristics of test items themselves and/or opportunities to learn. Suggestions for future research are made.

  15. Do Self Concept Tests Test Self Concept? An Evaluation of the Validity of Items on the Piers Harris and Coopersmith Measures.

    Science.gov (United States)

    Lynch, Mervin D.; Chaves, John

    Items from Peirs-Harris and Coopersmith self-concept tests were evaluated against independent measures on three self-constructs, idealized, empathic, and worth. Construct measurements were obtained with the semantic differential and D statistic. Ratings were obtained from 381 children, grades 4-6. For each test, item ratings and construct measures…

  16. Conditioning factors of test-taking engagement in PIAAC: an exploratory IRT modelling approach considering person and item characteristics

    Directory of Open Access Journals (Sweden)

    Frank Goldhammer

    2017-11-01

    Full Text Available Abstract Background A potential problem of low-stakes large-scale assessments such as the Programme for the International Assessment of Adult Competencies (PIAAC is low test-taking engagement. The present study pursued two goals in order to better understand conditioning factors of test-taking disengagement: First, a model-based approach was used to investigate whether item indicators of disengagement constitute a continuous latent person variable by domain. Second, the effects of person and item characteristics were jointly tested using explanatory item response models. Methods Analyses were based on the Canadian sample of Round 1 of the PIAAC, with N = 26,683 participants completing test items in the domains of literacy, numeracy, and problem solving. Binary item disengagement indicators were created by means of item response time thresholds. Results The results showed that disengagement indicators define a latent dimension by domain. Disengagement increased with lower educational attainment, lower cognitive skills, and when the test language was not the participant’s native language. Gender did not exert any effect on disengagement, while age had a positive effect for problem solving only. An item’s location in the second of two assessment modules was positively related to disengagement, as was item difficulty. The latter effect was negatively moderated by cognitive skill, suggesting that poor test-takers are especially likely to disengage with more difficult items. Conclusions The negative effect of cognitive skill, the positive effect of item difficulty, and their negative interaction effect support the assumption that disengagement is the outcome of individual expectations about success (informed disengagement.

  17. Detection of advance item knowledge using response times in computer adaptive testing

    NARCIS (Netherlands)

    Meijer, R.R.; Sotaridona, Leonardo

    2006-01-01

    We propose a new method for detecting item preknowledge in a CAT based on an estimate of “effective response time” for each item. Effective response time is defined as the time required for an individual examinee to answer an item correctly. An unusually short response time relative to the expected

  18. Randomization and Data-Analysis Items in Quality Standards for Single-Case Experimental Studies

    Science.gov (United States)

    Heyvaert, Mieke; Wendt, Oliver; Van den Noortgate, Wim; Onghena, Patrick

    2015-01-01

    Reporting standards and critical appraisal tools serve as beacons for researchers, reviewers, and research consumers. Parallel to existing guidelines for researchers to report and evaluate group-comparison studies, single-case experimental (SCE) researchers are in need of guidelines for reporting and evaluating SCE studies. A systematic search was…

  19. Concreteness effects in short-term memory: a test of the item-order hypothesis.

    Science.gov (United States)

    Roche, Jaclynn; Tolan, G Anne; Tehan, Gerald

    2011-12-01

    The following experiments explore word length and concreteness effects in short-term memory within an item-order processing framework. This framework asserts order memory is better for those items that are relatively easy to process at the item level. However, words that are difficult to process benefit at the item level for increased attention/resources being applied. The prediction of the model is that differential item and order processing can be detected in episodic tasks that differ in the degree to which item or order memory are required by the task. The item-order account has been applied to the word length effect such that there is a short word advantage in serial recall but a long word advantage in item recognition. The current experiment considered the possibility that concreteness effects might be explained within the same framework. In two experiments, word length (Experiment 1) and concreteness (Experiment 2) are examined using forward serial recall, backward serial recall, and item recognition. These results for word length replicate previous studies showing the dissociation in item and order tasks. The same was not true for the concreteness effect. In all three tasks concrete words were better remembered than abstract words. The concreteness effect cannot be explained in terms of an item-order trade off. PsycINFO Database Record (c) 2011 APA, all rights reserved.

  20. Development of coordination system model on single-supplier multi-buyer for multi-item supply chain with probabilistic demand

    Science.gov (United States)

    Olivia, G.; Santoso, A.; Prayogo, D. N.

    2017-11-01

    Nowadays, the level of competition between supply chains is getting tighter and a good coordination system between supply chains members is very crucial in solving the issue. This paper focused on a model development of coordination system between single supplier and buyers in a supply chain as a solution. Proposed optimization model was designed to determine the optimal number of deliveries from a supplier to buyers in order to minimize the total cost over a planning horizon. Components of the total supply chain cost consist of transportation costs, handling costs of supplier and buyers and also stock out costs. In the proposed optimization model, the supplier can supply various types of items to retailers whose item demand patterns are probabilistic. Sensitivity analysis of the proposed model was conducted to test the effect of changes in transport costs, handling costs and production capacities of the supplier. The results of the sensitivity analysis showed a significant influence on the changes in the transportation cost, handling costs and production capacity to the decisions of the optimal numbers of product delivery for each item to the buyers.

  1. Determination of radionuclides in environmental test items at CPHR: traceability and uncertainty calculation.

    Science.gov (United States)

    Carrazana González, J; Fernández, I M; Capote Ferrera, E; Rodríguez Castro, G

    2008-11-01

    Information about how the laboratory of Centro de Protección e Higiene de las Radiaciones (CPHR), Cuba establishes its traceability to the International System of Units for the measurement of radionuclides in environmental test items is presented. A comparison among different methodologies of uncertainty calculation, including an analysis of the feasibility of using the Kragten-spreadsheet approach, is shown. In the specific case of the gamma spectrometric assay, the influence of each parameter, and the identification of the major contributor, in the relative difference between the methods of uncertainty calculation (Kragten and partial derivative) is described. The reliability of the uncertainty calculation results reported by the commercial software Gamma 2000 from Silena is analyzed.

  2. Determination of radionuclides in environmental test items at CPHR: Traceability and uncertainty calculation

    International Nuclear Information System (INIS)

    Carrazana Gonzalez, J.; Fernandez, I.M.; Capote Ferrera, E.; Rodriguez Castro, G.

    2008-01-01

    Information about how the laboratory of Centro de Proteccion e Higiene de las Radiaciones (CPHR), Cuba establishes its traceability to the International System of Units for the measurement of radionuclides in environmental test items is presented. A comparison among different methodologies of uncertainty calculation, including an analysis of the feasibility of using the Kragten-spreadsheet approach, is shown. In the specific case of the gamma spectrometric assay, the influence of each parameter, and the identification of the major contributor, in the relative difference between the methods of uncertainty calculation (Kragten and partial derivative) is described. The reliability of the uncertainty calculation results reported by the commercial software Gamma 2000 from Silena is analyzed

  3. Assessing the discriminating power of item and test scores in the linear factor-analysis model

    Directory of Open Access Journals (Sweden)

    Pere J. Ferrando

    2012-01-01

    Full Text Available Las propuestas rigurosas y basadas en un modelo psicométrico para estudiar el impreciso concepto de "capacidad discriminativa" son escasas y generalmente limitadas a los modelos no-lineales para items binarios. En este artículo se propone un marco general para evaluar la capacidad discriminativa de las puntuaciones en ítems y tests que son calibrados mediante el modelo de un factor común. La propuesta se organiza en torno a tres criterios: (a tipo de puntuación, (b rango de discriminación y (c aspecto específico que se evalúa. Dentro del marco propuesto: (a se discuten las relaciones entre 16 medidas, de las cuales 6 parecen ser nuevas, y (b se estudian las relaciones entre ellas. La utilidad de la propuesta en las aplicaciones psicométricas que usan el modelo factorial se ilustra mediante un ejemplo empírico.

  4. A Case Study on an Item Writing Process: Use of Test Specifications, Nature of Group Dynamics, and Individual Item Writers' Characteristics

    Science.gov (United States)

    Kim, Jiyoung; Chi, Youngshin; Huensch, Amanda; Jun, Heesung; Li, Hongli; Roullion, Vanessa

    2010-01-01

    This article discusses a case study on an item writing process that reflects on our practical experience in an item development project. The purpose of the article is to share our lessons from the experience aiming to demystify item writing process. The study investigated three issues that naturally emerged during the project: how item writers use…

  5. Developing a Numerical Ability Test for Students of Education in Jordan: An Application of Item Response Theory

    Science.gov (United States)

    Abed, Eman Rasmi; Al-Absi, Mohammad Mustafa; Abu shindi, Yousef Abdelqader

    2016-01-01

    The purpose of the present study is developing a test to measure the numerical ability for students of education. The sample of the study consisted of (504) students from 8 universities in Jordan. The final draft of the test contains 45 items distributed among 5 dimensions. The results revealed that acceptable psychometric properties of the test;…

  6. 48 CFR 245.7101-3 - DD Form 1348-1, DoD Single Line Item Release/Receipt Document.

    Science.gov (United States)

    2010-10-01

    ... 48 Federal Acquisition Regulations System 3 2010-10-01 2010-10-01 false DD Form 1348-1, DoD Single Line Item Release/Receipt Document. 245.7101-3 Section 245.7101-3 Federal Acquisition Regulations... PROPERTY Plant Clearance Forms 245.7101-3 DD Form 1348-1, DoD Single Line Item Release/Receipt Document...

  7. The work ability index and single-item question: associations with sick leave, symptoms, and health--a prospective study of women on long-term sick leave.

    Science.gov (United States)

    Ahlstrom, Linda; Grimby-Ekman, Anna; Hagberg, Mats; Dellve, Lotta

    2010-09-01

    This study investigated the association between the work ability index (WAI) and the single-item question on work ability among women working in human service organizations (HSO) currently on long-term sick leave. It also examined the association between the WAI and the single-item question in relation to sick leave, symptoms, and health. Predictive values of the WAI, the changed WAI, the single-item question and the changed single-item question were investigated for degree of sick leave, symptoms, and health. This cohort study comprised 324 HSO female workers on long-term (>60 days) sick leave, with follow-ups at 6 and 12 months. Participants responded to questionnaires. Data on work ability, sick leave, health, and symptoms were analyzed with regard to associations and predictability. Spearman correlation and mixed-model analysis were performed for repeated measurements over time. The study showed a very strong association between the WAI and the single-item question among all participants. Both the WAI and the single-item question showed similar patterns of associations with sick leave, health, and symptoms. The predictive value for the degree of sick leave and health-related quality of life (HRQoL) was strong for both the WAI and the single-item question, and slightly less strong for vitality, neck pain, both self-rated general and mental health, and behavioral and current stress. This study suggests that the single-item question on work ability could be used as a simple indicator for assessing the status and progress of work ability among women on long-term sick leave.

  8. Diagnostic accuracy of a two-item Drug Abuse Screening Test (DAST-2).

    Science.gov (United States)

    Tiet, Quyen Q; Leyva, Yani E; Moos, Rudolf H; Smith, Brandy

    2017-11-01

    Drug use is prevalent and costly to society, but individuals with drug use disorders (DUDs) are under-diagnosed and under-treated, particularly in primary care (PC) settings. Drug screening instruments have been developed to identify patients with DUDs and facilitate treatment. The Drug Abuse Screening Test (DAST) is one of the most well-known drug screening instruments. However, similar to many such instruments, it is too long for routine use in busy PC settings. This study developed and validated a briefer and more practical DAST for busy PC settings. We recruited 1300 PC patients in two Department of Veterans Affairs (VA) clinics. Participants responded to a structured diagnostic interview. We randomly selected half of the sample to develop and the other half to validate the new instrument. We employed signal detection techniques to select the best DAST items to identify DUDs (based on the MINI) and negative consequences of drug use (measured by the Inventory of Drug Use Consequences). Performance indicators were calculated. The two-item DAST (DAST-2) was 97% sensitive and 91% specific for DUDs in the development sample and 95% sensitive and 89% specific in the validation sample. It was highly sensitive and specific for DUD and negative consequences of drug use in subgroups of patients, including gender, age, race/ethnicity, marital status, educational level, and posttraumatic stress disorder status. The DAST-2 is an appropriate drug screening instrument for routine use in PC settings in the VA and may be applicable in broader range of PC clinics. Published by Elsevier Ltd.

  9. Branched Adaptive Testing with a Rasch-Model-Calibrated Test: Analysing Item Presentation's Sequence Effects Using the Rasch-Model-Based LLTM

    Science.gov (United States)

    Kubinger, Klaus D.; Reif, Manuel; Yanagida, Takuya

    2011-01-01

    Item position effects provoke serious problems within adaptive testing. This is because different testees are necessarily presented with the same item at different presentation positions, as a consequence of which comparing their ability parameter estimations in the case of such effects would not at all be fair. In this article, a specific…

  10. A leukocyte activation test identifies food items which induce release of DNA by innate immune peripheral blood leucocytes.

    Science.gov (United States)

    Garcia-Martinez, Irma; Weiss, Theresa R; Yousaf, Muhammad N; Ali, Ather; Mehal, Wajahat Z

    2018-01-01

    Leukocyte activation (LA) testing identifies food items that induce a patient specific cellular response in the immune system, and has recently been shown in a randomized double blinded prospective study to reduce symptoms in patients with irritable bowel syndrome (IBS). We hypothesized that test reactivity to particular food items, and the systemic immune response initiated by these food items, is due to the release of cellular DNA from blood immune cells. We tested this by quantifying total DNA concentration in the cellular supernatant of immune cells exposed to positive and negative foods from 20 healthy volunteers. To establish if the DNA release by positive samples is a specific phenomenon, we quantified myeloperoxidase (MPO) in cellular supernatants. We further assessed if a particular immune cell population (neutrophils, eosinophils, and basophils) was activated by the positive food items by flow cytometry analysis. To identify the signaling pathways that are required for DNA release we tested if specific inhibitors of key signaling pathways could block DNA release. Foods with a positive LA test result gave a higher supernatant DNA content when compared to foods with a negative result. This was specific as MPO levels were not increased by foods with a positive LA test. Protein kinase C (PKC) inhibitors resulted in inhibition of positive food stimulated DNA release. Positive foods resulted in CD63 levels greater than negative foods in eosinophils in 76.5% of tests. LA test identifies food items that result in release of DNA and activation of peripheral blood innate immune cells in a PKC dependent manner, suggesting that this LA test identifies food items that result in release of inflammatory markers and activation of innate immune cells. This may be the basis for the improvement in symptoms in IBS patients who followed an LA test guided diet.

  11. Measurement Properties of Two Innovative Item Formats in a Computer-Based Test

    Science.gov (United States)

    Wan, Lei; Henly, George A.

    2012-01-01

    Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats--the figural response (FR) and constructed response (CR) formats used in a K-12 computerized…

  12. Developing and testing items for the South African Personality Inventory (SAPI

    Directory of Open Access Journals (Sweden)

    Carin Hill

    2013-11-01

    Research purpose: This article reports on the process of identifying items for, and provides a quantitative evaluation of, the South African Personality Inventory (SAPI items. Motivation for the study: The study intended to develop an indigenous and psychometrically sound personality instrument that adheres to the requirements of South African legislation and excludes cultural bias. Research design, approach and method: The authors used a cross-sectional design. They measured the nine SAPI clusters identified in the qualitative stage of the SAPI project in 11 separate quantitative studies. Convenience sampling yielded 6735 participants. Statistical analysis focused on the construct validity and reliability of items. The authors eliminated items that showed poor performance, based on common psychometric criteria, and selected the best performing items to form part of the final version of the SAPI. Main findings: The authors developed 2573 items from the nine SAPI clusters. Of these, 2268 items were valid and reliable representations of the SAPI facets. Practical/managerial implications: The authors developed a large item pool. It measures personality in South Africa. Researchers can refine it for the SAPI. Furthermore, the project illustrates an approach that researchers can use in projects that aim to develop culturally-informed psychological measures. Contribution/value-add: Personality assessment is important for recruiting, selecting and developing employees. This study contributes to the current knowledge about the early processes researchers follow when they develop a personality instrument that measures personality fairly in different cultural groups, as the SAPI does.

  13. Item and test analysis to identify quality multiple choice questions (MCQS from an assessment of medical students of Ahmedabad, Gujarat

    Directory of Open Access Journals (Sweden)

    Sanju Gajjar

    2014-01-01

    Full Text Available Background: Multiple choice questions (MCQs are frequently used to assess students in different educational streams for their objectivity and wide reach of coverage in less time. However, the MCQs to be used must be of quality which depends upon its difficulty index (DIF I, discrimination index (DI and distracter efficiency (DE. Objective: To evaluate MCQs or items and develop a pool of valid items by assessing with DIF I, DI and DE and also to revise/ store or discard items based on obtained results. Settings: Study was conducted in a medical school of Ahmedabad. Materials and Methods: An internal examination in Community Medicine was conducted after 40 hours teaching during 1 st MBBS which was attended by 148 out of 150 students. Total 50 MCQs or items and 150 distractors were analyzed. Statistical Analysis: Data was entered and analyzed in MS Excel 2007 and simple proportions, mean, standard deviations, coefficient of variation were calculated and unpaired t test was applied. Results: Out of 50 items, 24 had "good to excellent" DIF I (31 - 60% and 15 had "good to excellent" DI (> 0.25. Mean DE was 88.6% considered as ideal/ acceptable and non functional distractors (NFD were only 11.4%. Mean DI was 0.14. Poor DI (< 0.15 with negative DI in 10 items indicates poor preparedness of students and some issues with framing of at least some of the MCQs. Increased proportion of NFDs (incorrect alternatives selected by < 5% students in an item decrease DE and makes it easier. There were 15 items with 17 NFDs, while rest items did not have any NFD with mean DE of 100%. Conclusion: Study emphasizes the selection of quality MCQs which truly assess the knowledge and are able to differentiate the students of different abilities in correct manner.

  14. De item-reeks van de cognitieve screening test vergeleken met die van de mini-mental state examination

    NARCIS (Netherlands)

    Schmand, B.; Deelman, B. G.; Hooijer, C.; Jonker, C.; Lindeboom, J.

    1996-01-01

    The items of the ¿mini-mental state examination' (MMSE) and a Dutch dementia screening instrument, the ¿cognitive screening test' (CST), as well as the ¿geriatric mental status schedule' (GMS) and the ¿Dutch adult reading test' (DART), were administered to 4051 elderly people aged 65 to 84 years.

  15. Maslach Burnout Inventory and a Self-Defined, Single-Item Burnout Measure Produce Different Clinician and Staff Burnout Estimates.

    Science.gov (United States)

    Knox, Margae; Willard-Grace, Rachel; Huang, Beatrice; Grumbach, Kevin

    2018-06-04

    Clinicians and healthcare staff report high levels of burnout. Two common burnout assessments are the Maslach Burnout Inventory (MBI) and a single-item, self-defined burnout measure. Relatively little is known about how the measures compare. To identify the sensitivity, specificity, and concurrent validity of the self-defined burnout measure compared to the more established MBI measure. Cross-sectional survey (November 2016-January 2017). Four hundred forty-four primary care clinicians and 606 staff from three San Francisco Aarea healthcare systems. The MBI measure, calculated from a high score on either the emotional exhaustion or cynicism subscale, and a single-item measure of self-defined burnout. Concurrent validity was assessed using a validated, 7-item team culture scale as reported by Willard-Grace et al. (J Am Board Fam Med 27(2):229-38, 2014) and a standard question about workplace atmosphere as reported by Rassolian et al. (JAMA Intern Med 177(7):1036-8, 2017) and Linzer et al. (Ann Intern Med 151(1):28-36, 2009). Similar to other nationally representative burnout estimates, 52% of clinicians (95% CI: 47-57%) and 46% of staff (95% CI: 42-50%) reported high MBI emotional exhaustion or high MBI cynicism. In contrast, 29% of clinicians (95% CI: 25-33%) and 31% of staff (95% CI: 28-35%) reported "definitely burning out" or more severe symptoms on the self-defined burnout measure. The self-defined measure's sensitivity to correctly identify MBI-assessed burnout was 50.4% for clinicians and 58.6% for staff; specificity was 94.7% for clinicians and 92.3% for staff. Area under the receiver operator curve was 0.82 for clinicians and 0.81 for staff. Team culture and atmosphere were significantly associated with both self-defined burnout and the MBI, confirming concurrent validity. Point estimates of burnout notably differ between the self-defined and MBI measures. Compared to the MBI, the self-defined burnout measure misses half of high-burnout clinicians and more

  16. Validity and usefulness of a single-item measure of patient-reported bother from side effects of cancer therapy.

    Science.gov (United States)

    Pearman, Timothy P; Beaumont, Jennifer L; Mroczek, Daniel; O'Connor, Mary; Cella, David

    2018-03-01

    The improving efficacy of cancer treatment has resulted in an increasing array of treatment-related symptoms and associated burdens imposed on individuals undergoing aggressive treatment of their disease. Often, clinical trials compare therapies that have different types, and severities, of adverse effects. Whether rated by clinicians or patients themselves, it can be difficult to know which side effect profile is more disruptive or bothersome to patients. A simple summary index of bother can help to adjudicate the variability in adverse effects across treatments being compared with each other. Across 4 studies, a total of 5765 patients enrolled in cooperative group studies and industry-sponsored clinical trials were the subjects of the current study. Patients were diagnosed with a range of primary cancer sites, including bladder, brain, breast, colon/rectum, head/neck, hepatobiliary, kidney, lung, ovary, pancreas, and prostate as well as leukemia and lymphoma. All patients were administered the Functional Assessment of Cancer Therapy-General version (FACT-G). The single item "I am bothered by side effects of treatment" (GP5), rated on a 5-point Likert scale, is part of the FACT-G. To determine its validity as a useful summary measure from the patient perspective, it was correlated with individual and aggregated clinician-rated adverse events and patient reports of their general ability to enjoy life. Analyses of pharmaceutical trials demonstrated that mean GP5 scores ("I am bothered by side effects of treatment") significantly differed by maximum adverse event grade (PEffect sizes ranged from 0.13 to 0.46. Analyses of cooperative group trials demonstrated a significant correlation between GP5 and item GF3 ("I am able to enjoy life") in the predicted direction. The single FACT-G item "I am bothered by side effects of treatment" is significantly associated with clinician-reported adverse events and with patients' ability to enjoy their lives. It has promise as an

  17. Assessing Health Status in Inflammatory Bowel Disease using a Novel Single-Item Numeric Rating Scale

    Science.gov (United States)

    Surti, Bijal; Spiegel, Brennan; Ippoliti, Andrew; Vasiliauskas, Eric; Simpson, Peter; Shih, David; Targan, Stephan; McGovern, Dermot; Melmed, Gil Y.

    2014-01-01

    Background Current instruments used to measure disease activity and health-related quality of life (HRQOL) in patients with Crohn’s disease (CD) and ulcerative colitis (UC) are often cumbersome, time-consuming, and expensive; although used in clinical trials, they are not convenient for clinical practice. A numeric rating scale (NRS) is a quick, inexpensive, and convenient patient-reported outcome (PRO) that can capture the patient’s overall perception of health. Aims To assess the validity, reliability, and responsiveness of an NRS and evaluate its use in clinical practice in patients with CD and UC. Methods We prospectively evaluated patient-reported NRS scores and measured correlations between NRS and a range of severity measures, including physician-reported NRS, Crohn’s disease activity index (CDAI), Harvey-Bradshaw index (HBI), inflammatory bowel disease questionnaire (IBDQ), and C-reactive protein (CRP) in patients with CD. Subsequently, we evaluated the correlation between the NRS and standard measures of health status (HBI or simple colitis clinical activity index [SCCAI]) and laboratory tests (sedimentation rate [ESR], CRP, and fecal calprotectin) in patients with CD and UC. Results The patient-reported NRS showed excellent correlation with CDAI (R2=0.59, p<0.0001), IBDQ (R2=0.66, p<0.0001), and HBI (R2=0.32, p<0.0001) in patients with CD. The NRS showed poor, but statistically significant correlation with SCCAI (R2=0.25, p<0.0001) in patients with UC. The NRS did not correlate with CRP, ESR, or calprotectin. The NRS was reliable and responsive to change. Conclusions The NRS is a valid, reliable, and responsive measure that may be useful to evaluate patients with CD and possibly UC. PMID:23250673

  18. Assessment of chromium(VI) release from 848 jewellery items by use of a diphenylcarbazide spot test

    DEFF Research Database (Denmark)

    Bregnbak, David; Johansen, Jeanne D.; Hamann, Dathan

    2016-01-01

    We recently evaluated and validated a diphenylcarbazide(DPC)-based screening spot test that can detect the release of chromium(VI) ions (≥0.5 ppm) from various metallic items and leather goods (1). We then screened a selection of metal screws, leather shoes, and gloves, as well as 50 earrings......, and identified chromium(VI) release from one earring. In the present study, we used the DPC spot test to assess chromium(VI) release in a much larger sample of jewellery items (n=848), 160 (19%) of which had previously be shown to contain chromium when analysed with X-ray fluorescence spectroscopy (2)....

  19. Development of an item bank and computer adaptive test for role functioning

    DEFF Research Database (Denmark)

    Anatchkova, Milena D; Rose, Matthias; Ware, John E

    2012-01-01

    Role functioning (RF) is a key component of health and well-being and an important outcome in health research. The aim of this study was to develop an item bank to measure impact of health on role functioning.......Role functioning (RF) is a key component of health and well-being and an important outcome in health research. The aim of this study was to develop an item bank to measure impact of health on role functioning....

  20. A single-item self-report medication adherence question predicts hospitalisation and death in patients with heart failure.

    Science.gov (United States)

    Wu, Jia-Rong; DeWalt, Darren A; Baker, David W; Schillinger, Dean; Ruo, Bernice; Bibbins-Domingo, Kristen; Macabasco-O'Connell, Aurelia; Holmes, George M; Broucksou, Kimberly A; Erman, Brian; Hawk, Victoria; Cene, Crystal W; Jones, Christine DeLong; Pignone, Michael

    2014-09-01

    To determine whether a single-item self-report medication adherence question predicts hospitalisation and death in patients with heart failure. Poor medication adherence is associated with increased morbidity and mortality. Having a simple means of identifying suboptimal medication adherence could help identify at-risk patients for interventions. We performed a prospective cohort study in 592 participants with heart failure within a four-site randomised trial. Self-report medication adherence was assessed at baseline using a single-item question: 'Over the past seven days, how many times did you miss a dose of any of your heart medication?' Participants who reported no missing doses were defined as fully adherent, and those missing more than one dose were considered less than fully adherent. The primary outcome was combined all-cause hospitalisation or death over one year and the secondary endpoint was heart failure hospitalisation. Outcomes were assessed with blinded chart reviews, and heart failure outcomes were determined by a blinded adjudication committee. We used negative binomial regression to examine the relationship between medication adherence and outcomes. Fifty-two percent of participants were 52% male, mean age was 61 years, and 31% were of New York Heart Association class III/IV at enrolment; 72% of participants reported full adherence to their heart medicine at baseline. Participants with full medication adherence had a lower rate of all-cause hospitalisation and death (0·71 events/year) compared with those with any nonadherence (0·86 events/year): adjusted-for-site incidence rate ratio was 0·83, fully adjusted incidence rate ratio 0·68. Incidence rate ratios were similar for heart failure hospitalisations. A single medication adherence question at baseline predicts hospitalisation and death over one year in heart failure patients. Medication adherence is associated with all-cause and heart failure-related hospitalisation and death in heart

  1. A Multidimensional Partial Credit Model with Associated Item and Test Statistics: An Application to Mixed-Format Tests

    Science.gov (United States)

    Yao, Lihua; Schwarz, Richard D.

    2006-01-01

    Multidimensional item response theory (IRT) models have been proposed for better understanding the dimensional structure of data or to define diagnostic profiles of student learning. A compensatory multidimensional two-parameter partial credit model (M-2PPC) for constructed-response items is presented that is a generalization of those proposed to…

  2. Differential Item Functioning in While-Listening Performance Tests: The Case of the International English Language Testing System (IELTS) Listening Module

    Science.gov (United States)

    Aryadoust, Vahid

    2012-01-01

    This article investigates a version of the International English Language Testing System (IELTS) listening test for evidence of differential item functioning (DIF) based on gender, nationality, age, and degree of previous exposure to the test. Overall, the listening construct was found to be underrepresented, which is probably an important cause…

  3. An evaluation of computerized adaptive testing for general psychological distress: combining GHQ-12 and Affectometer-2 in an item bank for public mental health research.

    Science.gov (United States)

    Stochl, Jan; Böhnke, Jan R; Pickett, Kate E; Croudace, Tim J

    2016-05-20

    Recent developments in psychometric modeling and technology allow pooling well-validated items from existing instruments into larger item banks and their deployment through methods of computerized adaptive testing (CAT). Use of item response theory-based bifactor methods and integrative data analysis overcomes barriers in cross-instrument comparison. This paper presents the joint calibration of an item bank for researchers keen to investigate population variations in general psychological distress (GPD). Multidimensional item response theory was used on existing health survey data from the Scottish Health Education Population Survey (n = 766) to calibrate an item bank consisting of pooled items from the short common mental disorder screen (GHQ-12) and the Affectometer-2 (a measure of "general happiness"). Computer simulation was used to evaluate usefulness and efficacy of its adaptive administration. A bifactor model capturing variation across a continuum of population distress (while controlling for artefacts due to item wording) was supported. The numbers of items for different required reliabilities in adaptive administration demonstrated promising efficacy of the proposed item bank. Psychometric modeling of the common dimension captured by more than one instrument offers the potential of adaptive testing for GPD using individually sequenced combinations of existing survey items. The potential for linking other item sets with alternative candidate measures of positive mental health is discussed since an optimal item bank may require even more items than these.

  4. Introduction to Psychology and Leadership. Part Nine; Morale and Esprit De Corps. Progress Check. Test Item Pool. Segments I & II.

    Science.gov (United States)

    Westinghouse Learning Corp., Annapolis, MD.

    Test items for the introduction to psychology and leadership course (see the final reports which summarize the course development project, EM 010 418, EM 010 419, and EM 010 484) which were compiled as part of the project documentation and which are coordinated with the text-workbook on morale and esprit de corps (EM 010 439, EM 010 440, and EM…

  5. Probabilistic Approaches to Examining Linguistic Features of Test Items and Their Effect on the Performance of English Language Learners

    Science.gov (United States)

    Solano-Flores, Guillermo

    2014-01-01

    This article addresses validity and fairness in the testing of English language learners (ELLs)--students in the United States who are developing English as a second language. It discusses limitations of current approaches to examining the linguistic features of items and their effect on the performance of ELL students. The article submits that…

  6. An Analysis of Cross Racial Identity Scale Scores Using Classical Test Theory and Rasch Item Response Models

    Science.gov (United States)

    Sussman, Joshua; Beaujean, A. Alexander; Worrell, Frank C.; Watson, Stevie

    2013-01-01

    Item response models (IRMs) were used to analyze Cross Racial Identity Scale (CRIS) scores. Rasch analysis scores were compared with classical test theory (CTT) scores. The partial credit model demonstrated a high goodness of fit and correlations between Rasch and CTT scores ranged from 0.91 to 0.99. CRIS scores are supported by both methods.…

  7. Development of a Postacute Hospital Item Bank for the New Pediatric Evaluation of Disability Inventory-Computer Adaptive Test

    Science.gov (United States)

    Dumas, Helene M.

    2010-01-01

    The PEDI-CAT is a new computer adaptive test (CAT) version of the Pediatric Evaluation of Disability Inventory (PEDI). Additional PEDI-CAT items specific to postacute pediatric hospital care were recently developed using expert reviews and cognitive interviewing techniques. Expert reviews established face and construct validity, providing positive…

  8. Effectiveness of Item Response Theory (IRT) Proficiency Estimation Methods under Adaptive Multistage Testing. Research Report. ETS RR-15-11

    Science.gov (United States)

    Kim, Sooyeon; Moses, Tim; Yoo, Hanwook Henry

    2015-01-01

    The purpose of this inquiry was to investigate the effectiveness of item response theory (IRT) proficiency estimators in terms of estimation bias and error under multistage testing (MST). We chose a 2-stage MST design in which 1 adaptation to the examinees' ability levels takes place. It includes 4 modules (1 at Stage 1, 3 at Stage 2) and 3 paths…

  9. Examining Construct Congruence for Psychometric Tests: A Note on an Extension to Binary Items and Nesting Effects

    Science.gov (United States)

    Raykov, Tenko; Marcoulides, George A.; Dimitrov, Dimiter M.; Li, Tatyana

    2018-01-01

    This article extends the procedure outlined in the article by Raykov, Marcoulides, and Tong for testing congruence of latent constructs to the setting of binary items and clustering effects. In this widely used setting in contemporary educational and psychological research, the method can be used to examine if two or more homogeneous…

  10. Biological Science: An Ecological Approach. BSCS Green Version. Teacher's Resource Book and Test Item Bank. Sixth Edition.

    Science.gov (United States)

    Biological Sciences Curriculum Study, Colorado Springs.

    This book consists of four sections: (1) "Supplemental Materials"; (2) "Supplemental Investigations"; (3) "Test Item Bank"; and (4) "Blackline Masters." The first section provides additional background material related to selected chapters and investigations in the student book. Included are a periodic table of the elements, genetics problems and…

  11. 49 CFR 238.311 - Single car test.

    Science.gov (United States)

    2010-10-01

    ... 49 Transportation 4 2010-10-01 2010-10-01 false Single car test. 238.311 Section 238.311... Requirements for Tier I Passenger Equipment § 238.311 Single car test. (a) Except for self-propelled passenger cars, single car tests of all passenger cars and all unpowered vehicles used in passenger trains shall...

  12. Threats to Validity When Using Open-Ended Items in International Achievement Studies: Coding Responses to the PISA 2012 Problem-Solving Test in Finland

    Science.gov (United States)

    Arffman, Inga

    2016-01-01

    Open-ended (OE) items are widely used to gather data on student performance in international achievement studies. However, several factors may threaten validity when using such items. This study examined Finnish coders' opinions about threats to validity when coding responses to OE items in the PISA 2012 problem-solving test. A total of 6…

  13. Effect of Item Response Theory (IRT) Model Selection on Testlet-Based Test Equating. Research Report. ETS RR-14-19

    Science.gov (United States)

    Cao, Yi; Lu, Ru; Tao, Wei

    2014-01-01

    The local item independence assumption underlying traditional item response theory (IRT) models is often not met for tests composed of testlets. There are 3 major approaches to addressing this issue: (a) ignore the violation and use a dichotomous IRT model (e.g., the 2-parameter logistic [2PL] model), (b) combine the interdependent items to form a…

  14. Dynamic Testing of Analogical Reasoning in 5- to 6-Year-Olds: Multiple-Choice versus Constructed-Response Training Items

    Science.gov (United States)

    Stevenson, Claire E.; Heiser, Willem J.; Resing, Wilma C. M.

    2016-01-01

    Multiple-choice (MC) analogy items are often used in cognitive assessment. However, in dynamic testing, where the aim is to provide insight into potential for learning and the learning process, constructed-response (CR) items may be of benefit. This study investigated whether training with CR or MC items leads to differences in the strategy…

  15. Testing the Item-Order Account of Design Effects Using the Production Effect

    Science.gov (United States)

    Jonker, Tanya R.; Levene, Merrick; MacLeod, Colin M.

    2014-01-01

    A number of memory phenomena evident in recall in within-subject, mixed-lists designs are reduced or eliminated in between-subject, pure-list designs. The item-order account (McDaniel & Bugg, 2008) proposes that differential retention of order information might underlie this pattern. According to this account, order information may be encoded…

  16. Detecting intrajudge inconsistency in standard setting using test items with a selected-response format

    NARCIS (Netherlands)

    van der Linden, Willem J.; Vos, Hendrik J.; Chang, Lei

    2002-01-01

    In judgmental standard setting experiments, it may be difficult to specify subjective probabilities that adequately take the properties of the items into account. As a result, these probabilities are not consistent with each other in the sense that they do not refer to the same borderline level of

  17. Design of Web Questionnaires : A Test for Number of Items per Screen

    NARCIS (Netherlands)

    Toepoel, V.; Das, J.W.M.; van Soest, A.H.O.

    2005-01-01

    This paper presents results from an experimental manipulation of one versus multiple-items per screen format in a Web survey.The purpose of the experiment was to find out if a questionnaire s format influences how respondents provide answers in online questionnaires and if this is depending on

  18. Re-Examining Test Item Issues in the TIMSS Mathematics and Science Assessments

    Science.gov (United States)

    Wang, Jianjun

    2011-01-01

    As the largest international study ever taken in history, the Trend in Mathematics and Science Study (TIMSS) has been held as a benchmark to measure U.S. student performance in the global context. In-depth analyses of the TIMSS project are conducted in this study to examine key issues of the comparative investigation: (1) item flaws in mathematics…

  19. Using existing questionnaires in latent class analysis: should we use summary scores or single items as input? A methodological study using a cohort of patients with low back pain

    Directory of Open Access Journals (Sweden)

    Nielsen AM

    2016-04-01

    of more subgroups and more distinct clinical characteristics. Conclusion: In these data, application of both the summary-score strategy and the single-item strategy in the LCA subgrouping resulted in clinically interpretable subgroups, but the single-item strategy generally revealed more distinguishing characteristics. These results 1 warrant further analyses in other data sets to determine the consistency of this finding, and 2 warrant investigation in longitudinal data to test whether the finer detail provided by the single-item strategy results in improved prediction of outcomes and treatment response. Keywords: classification, data mining, subgrouping, clinical interpretability, questionnaire, low back pain

  20. Higher-Order Asymptotics and Its Application to Testing the Equality of the Examinee Ability Over Two Sets of Items.

    Science.gov (United States)

    Sinharay, Sandip; Jensen, Jens Ledet

    2018-06-27

    In educational and psychological measurement, researchers and/or practitioners are often interested in examining whether the ability of an examinee is the same over two sets of items. Such problems can arise in measurement of change, detection of cheating on unproctored tests, erasure analysis, detection of item preknowledge, etc. Traditional frequentist approaches that are used in such problems include the Wald test, the likelihood ratio test, and the score test (e.g., Fischer, Appl Psychol Meas 27:3-26, 2003; Finkelman, Weiss, & Kim-Kang, Appl Psychol Meas 34:238-254, 2010; Glas & Dagohoy, Psychometrika 72:159-180, 2007; Guo & Drasgow, Int J Sel Assess 18:351-364, 2010; Klauer & Rettig, Br J Math Stat Psychol 43:193-206, 1990; Sinharay, J Educ Behav Stat 42:46-68, 2017). This paper shows that approaches based on higher-order asymptotics (e.g., Barndorff-Nielsen & Cox, Inference and asymptotics. Springer, London, 1994; Ghosh, Higher order asymptotics. Institute of Mathematical Statistics, Hayward, 1994) can also be used to test for the equality of the examinee ability over two sets of items. The modified signed likelihood ratio test (e.g., Barndorff-Nielsen, Biometrika 73:307-322, 1986) and the Lugannani-Rice approximation (Lugannani & Rice, Adv Appl Prob 12:475-490, 1980), both of which are based on higher-order asymptotics, are shown to provide some improvement over the traditional frequentist approaches in three simulations. Two real data examples are also provided.

  1. Psychometric evaluation of an item bank for computerized adaptive testing of the EORTC QLQ-C30 cognitive functioning dimension in cancer patients

    DEFF Research Database (Denmark)

    Dirven, Linda; Groenvold, Mogens; Taphoorn, Martin J. B.

    2017-01-01

    on the field-testing and psychometric evaluation of the item bank for cognitive functioning (CF). METHODS: In previous phases (I-III), 44 candidate items were developed measuring CF in cancer patients. In phase IV, these items were psychometrically evaluated in a large sample of international cancer patients...... model, showing an acceptable fit. Although several items showed DIF, these had a negligible impact on CF estimation. Measurement precision of the item bank was much higher than the two original QLQ-C30 CF items alone, across the whole continuum. Moreover, CAT measurement may on average reduce study...... sample sizes with about 35-40% compared to the original QLQ-C30 CF scale, without loss of power. CONCLUSION: A CF item bank for CAT measurement consisting of 34 items was established, applicable to various cancer patients across countries. This CAT measurement system will facilitate precise and efficient...

  2. Psychometric evaluation of an item bank for computerized adaptive testing of the EORTC QLQ-C30 cognitive functioning dimension in cancer patients.

    Science.gov (United States)

    Dirven, Linda; Groenvold, Mogens; Taphoorn, Martin J B; Conroy, Thierry; Tomaszewski, Krzysztof A; Young, Teresa; Petersen, Morten Aa

    2017-11-01

    The European Organisation of Research and Treatment of Cancer (EORTC) Quality of Life Group is developing computerized adaptive testing (CAT) versions of all EORTC Quality of Life Questionnaire (QLQ-C30) scales with the aim to enhance measurement precision. Here we present the results on the field-testing and psychometric evaluation of the item bank for cognitive functioning (CF). In previous phases (I-III), 44 candidate items were developed measuring CF in cancer patients. In phase IV, these items were psychometrically evaluated in a large sample of international cancer patients. This evaluation included an assessment of dimensionality, fit to the item response theory (IRT) model, differential item functioning (DIF), and measurement properties. A total of 1030 cancer patients completed the 44 candidate items on CF. Of these, 34 items could be included in a unidimensional IRT model, showing an acceptable fit. Although several items showed DIF, these had a negligible impact on CF estimation. Measurement precision of the item bank was much higher than the two original QLQ-C30 CF items alone, across the whole continuum. Moreover, CAT measurement may on average reduce study sample sizes with about 35-40% compared to the original QLQ-C30 CF scale, without loss of power. A CF item bank for CAT measurement consisting of 34 items was established, applicable to various cancer patients across countries. This CAT measurement system will facilitate precise and efficient assessment of HRQOL of cancer patients, without loss of comparability of results.

  3. Concurrent Validity and Sensitivity to Change of Direct Behavior Rating Single-Item Scales (DBR-SIS) within an Elementary Sample

    Science.gov (United States)

    Smith, Rhonda L.; Eklund, Katie; Kilgus, Stephen P.

    2018-01-01

    The purpose of this study was to evaluate the concurrent validity, sensitivity to change, and teacher acceptability of Direct Behavior Rating single-item scales (DBR-SIS), a brief progress monitoring measure designed to assess student behavioral change in response to intervention. Twenty-four elementary teacher-student dyads implemented a daily…

  4. Open Single Item of Perceived Risk Factors (OSIPRF toward Cardiovascular Diseases Is an Appropriate Instrument for Evaluating Psychological Symptoms

    Directory of Open Access Journals (Sweden)

    Mozhgan Saeidi

    2016-12-01

    Full Text Available Psychological symptoms are considered as one of the aspects and consequences of cardiovascular diseases (CVDs, management of which can precipitate and facilitate the process of recovery. Evaluation of the psychological symptoms can increase awareness of treatment team regarding patients’ mental health, which can be beneficial for designing treatment programs (1. However, time-consuming process of interviews and assessment by questionnaires lead to fatigue and lack of patient cooperation, which may be problematic for healthcare evaluators. Therefore, the use of brief and suitable alternatives is always recommended.The use of practical and easy to implement instruments is constantly emphasized. A practical method for assessing patients' psychological status is examining causal beliefs and attitudes about the disease. The causal beliefs and perceived risk factors by patients, which are significantly related to the actual risk factors for CVDs (2, are not only related to psychological adjustment and mental health but also have an impact on patients’ compliance with treatment recommendations (3.It seems that several risk factors are at play regarding the perceived risk factors for CVDs such as gender (4, age (5, and most importantly, patients’ psychological status (3. Accordingly, evaluation of causal beliefs and perceived risk factors by patients could probably be a shortcut method for evaluation of patients’ psychological health. In recent years, Saeidi and Komasi (5 proposed a question and investigated the perceived risk factors with an open single item: “What do you think is the main cause of your illness?”. According to the authors, the perceived risk factors are recorded in five categories including biological (age, gender, and family history, environmental (dust, smoke, passive smoking, toxic substances, and effects of war, physiological (diabetes, hypertension, hyperlipidemia, and obesity, behavioral (lack of exercise, nutrition

  5. The six-item Clock Drawing Test – reliability and validity in mild Alzheimer’s disease

    DEFF Research Database (Denmark)

    Jørgensen, Kasper; Kristensen, Maria K; Waldemar, Gunhild

    2015-01-01

    This study presents a reliable, short and practical version of the Clock Drawing Test (CDT) for clinical use and examines its diagnostic accuracy in mild Alzheimer's disease versus elderly nonpatients. Clock drawings from 231 participants were scored independently by four clinical neuropsychologi......This study presents a reliable, short and practical version of the Clock Drawing Test (CDT) for clinical use and examines its diagnostic accuracy in mild Alzheimer's disease versus elderly nonpatients. Clock drawings from 231 participants were scored independently by four clinical...... neuropsychologists blind to diagnostic classification. The interrater agreement of individual scoring criteria was analyzed and items with poor or moderate reliability were excluded. The classification accuracy of the resulting scoring system - the six-item CDT - was examined. We explored the effect of further...

  6. 'Do you think you suffer from depression?' Reevaluating the use of a single item question for the screening of depression in older primary care patients

    DEFF Research Database (Denmark)

    Ayalon, Liat; Goldfracht, Margalit; Bech, Per

    2010-01-01

    evaluated against a depression diagnosis made by the Structured Clinical Interview for DSM-IV. RESULTS: Overall, 3.9% of the sample was diagnosed with depression. The most notable finding was that the single-item question, 'do you think you suffer from depression?' had as good or better sensitivity (83......%) than all other screens. Nonetheless, its specificity of 83% suggested that it has to be followed up by a through diagnostic interview. Additional sensitivity analyses concerning the use of a single depression item taken directly from the depression screening measures supported this finding. CONCLUSIONS......: An easy way to detect depression in older primary care patients would be asking the single question, 'do you think you suffer from depression?'...

  7. Generalization of the Lord-Wingersky Algorithm to Computing the Distribution of Summed Test Scores Based on Real-Number Item Scores

    Science.gov (United States)

    Kim, Seonghoon

    2013-01-01

    With known item response theory (IRT) item parameters, Lord and Wingersky provided a recursive algorithm for computing the conditional frequency distribution of number-correct test scores, given proficiency. This article presents a generalized algorithm for computing the conditional distribution of summed test scores involving real-number item…

  8. Re-Fitting for a Different Purpose: A Case Study of Item Writer Practices in Adapting Source Texts for a Test of Academic Reading

    Science.gov (United States)

    Green, Anthony; Hawkey, Roger

    2012-01-01

    The important yet under-researched role of item writers in the selection and adaptation of texts for high-stakes reading tests is investigated through a case study involving a group of trained item writers working on the International English Language Testing System (IELTS). In the first phase of the study, participants were invited to reflect in…

  9. 'Do you think you suffer from depression?' Reevaluating the use of a single item question for the screening of depression in older primary care patients

    DEFF Research Database (Denmark)

    Ayalon, Liat; Goldfracht, Margalit; Bech, Per

    2010-01-01

    OBJECTIVES: The majority of older adults seek depression treatment in primary care. Despite impressive efforts to integrate depression treatment into primary care, depression often remains undetected. The overall goal of the present study was to compare a single item screening for depression...... to existing depression screening tools. METHODS: A cross sectional sample of 153 older primary care patients. Participants completed several depression-screening measures (e.g. a single depression screen, Patient Health Questionnaire-9, Major Depression Inventory, Visual Analogue Scale). Measures were......: An easy way to detect depression in older primary care patients would be asking the single question, 'do you think you suffer from depression?'...

  10. What Do You Think You Are Measuring? A Mixed-Methods Procedure for Assessing the Content Validity of Test Items and Theory-Based Scaling

    Science.gov (United States)

    Koller, Ingrid; Levenson, Michael R.; Glück, Judith

    2017-01-01

    The valid measurement of latent constructs is crucial for psychological research. Here, we present a mixed-methods procedure for improving the precision of construct definitions, determining the content validity of items, evaluating the representativeness of items for the target construct, generating test items, and analyzing items on a theoretical basis. To illustrate the mixed-methods content-scaling-structure (CSS) procedure, we analyze the Adult Self-Transcendence Inventory, a self-report measure of wisdom (ASTI, Levenson et al., 2005). A content-validity analysis of the ASTI items was used as the basis of psychometric analyses using multidimensional item response models (N = 1215). We found that the new procedure produced important suggestions concerning five subdimensions of the ASTI that were not identifiable using exploratory methods. The study shows that the application of the suggested procedure leads to a deeper understanding of latent constructs. It also demonstrates the advantages of theory-based item analysis. PMID:28270777

  11. Development of an item bank for the EORTC Role Functioning Computer Adaptive Test (EORTC RF-CAT)

    DEFF Research Database (Denmark)

    Gamper, Eva-Maria; Petersen, Morten Aa.; Aaronson, Neil

    2016-01-01

    a computer-adaptive test (CAT) for RF. This was part of a larger project whose objective is to develop a CAT version of the EORTC QLQ-C30 which is one of the most widely used HRQOL instruments in oncology. METHODS: In accordance with EORTC guidelines, the development of the RF-CAT comprised four phases...... with good psychometric properties. The resulting item bank exhibits excellent reliability (mean reliability = 0.85, median = 0.95). Using the RF-CAT may allow sample size savings from 11 % up to 50 % compared to using the QLQ-C30 RF scale. CONCLUSIONS: The RF-CAT item bank improves the precision...

  12. Tests of the validity of a model relating frequency of contaminated items and increasing radiation dose

    International Nuclear Information System (INIS)

    Tallentire, A.; Khan, A.A.

    1975-01-01

    The 60 Co radiation response of Bacillus pumilus E601 spores has been characterized when present in a laboratory test system. The suitability of test vessels to act as both containers for irradiation and culture vessels in sterility testing has been checked. Tests have been done with these spores to verify assumptions basic to the general model described in a previous paper. First measurements indicate that the model holds with this laboratory test system. (author)

  13. The differential item functioning and structural equivalence of a nonverbal cognitive ability test for five language groups

    Directory of Open Access Journals (Sweden)

    Pieter Schaap

    2011-10-01

    Research purpose: The aim of the study was to determine the differential item functioning (DIF and structural equivalence of a nonverbal cognitive ability test (the PiB/SpEEx Observance test [401] for five South African language groups. Motivation for study: Cultural and language group sensitive tests can lead to unfair discrimination and is a contentious workplace issue in South Africa today. Misconceptions about psychometric testing in industry can cause tests to lose credibility if industries do not use a scientifically sound test-by-test evaluation approach. Research design, approach and method: The researcher used a quasi-experimental design and factor analytic and logistic regression techniques to meet the research aims. The study used a convenience sample drawn from industry and an educational institution. Main findings: The main findings of the study show structural equivalence of the test at a holistic level and nonsignificant DIF effect sizes for most of the comparisons that the researcher made. Practical/managerial implications: This research shows that the PIB/SpEEx Observance Test (401 is not completely language insensitive. One should see it rather as a language-reduced test when people from different language groups need testing. Contribution/value-add: The findings provide supporting evidence that nonverbal cognitive tests are plausible alternatives to verbal tests when one compares people from different language groups.

  14. The role of attention in item-item binding in visual working memory.

    Science.gov (United States)

    Peterson, Dwight J; Naveh-Benjamin, Moshe

    2017-09-01

    An important yet unresolved question regarding visual working memory (VWM) relates to whether or not binding processes within VWM require additional attentional resources compared with processing solely the individual components comprising these bindings. Previous findings indicate that binding of surface features (e.g., colored shapes) within VWM is not demanding of resources beyond what is required for single features. However, it is possible that other types of binding, such as the binding of complex, distinct items (e.g., faces and scenes), in VWM may require additional resources. In 3 experiments, we examined VWM item-item binding performance under no load, articulatory suppression, and backward counting using a modified change detection task. Binding performance declined to a greater extent than single-item performance under higher compared with lower levels of concurrent load. The findings from each of these experiments indicate that processing item-item bindings within VWM requires a greater amount of attentional resources compared with single items. These findings also highlight an important distinction between the role of attention in item-item binding within VWM and previous studies of long-term memory (LTM) where declines in single-item and binding test performance are similar under divided attention. The current findings provide novel evidence that the specific type of binding is an important determining factor regarding whether or not VWM binding processes require attention. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  15. Varying levels of difficulty index of skills-test items randomly selected by examinees on the Korean emergency medical technician licensing examination

    Directory of Open Access Journals (Sweden)

    Bongyeun Koh

    2016-01-01

    Full Text Available Purpose: The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE, which requires examinees to select items randomly. Methods: The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. Results: In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01, as well as 4 of the 5 items on the advanced skills test (P<0.05. In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01, as well as all 3 of the advanced skills test items (P<0.01. Conclusion: In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination.

  16. Varying levels of difficulty index of skills-test items randomly selected by examinees on the Korean emergency medical technician licensing examination.

    Science.gov (United States)

    Koh, Bongyeun; Hong, Sunggi; Kim, Soon-Sim; Hyun, Jin-Sook; Baek, Milye; Moon, Jundong; Kwon, Hayran; Kim, Gyoungyong; Min, Seonggi; Kang, Gu-Hyun

    2016-01-01

    The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination.

  17. 不同认知成分在图形推理测验项目难度预测中的作用%The Role of Different Cognitive Components in the Prediction of the Figural Reasoning Test's Item Difficulty

    Institute of Scientific and Technical Information of China (English)

    李中权; 王力; 张厚粲; 周仁来

    2011-01-01

    Figural reasoning tests (as represented by Raven's tests) are widely applied as effective measures of fluid intelligence in recruitment and personnel selection. However, several studies have revealed that those tests are not appropriate anymore due to high item exposure rates. Computerized automatic item generation (AIG) has gradually been recognized as a promising technique in handling item exposure. Understanding sources of item variation constitutes the initial stage of Computerized AIG, that is, searching for the underlying processing components and the stimuli that significantly influence those components. Some studies have explored sources of item variation, but so far there are no consistent results. This study investigated the relation between item difficulties and stimuli factors (e.g., familiarity of figures, abstraction of attributes, perceptual organization, and memory load) and determines the relative importance of those factors in predicting item difficulities.Eight sets of figural reasoning tests (each set containing 14 items imitating items from Raven's Advanced Progressive Matrics, APM) were constructed manipulating the familiarity of figures, the degree of abstraction of attributes, the perceptual organization as well as the types and number of rules. Using anchor-test design, these tests were administrated via the internet to 6323 participants with 10 items drawing from APAM as anchor items; thus, each participant completed 14 items from either one set and 10 anchor items within half an hour. In order to prevent participants from using response elimination strategy, we presented one item stem first, then alternatives in turn, and asked participants to determine which alternative was the best.DIMTEST analyses were conducted on the participants' responses on each of eight tests. Results showed that items measure a single dimension on each test. Likelihood ratio test indicated that the data fit two-parameter logistic model (2PL) best. Items were

  18. Developing Testing Accommodations for English Language Learners: Illustrations as Visual Supports for Item Accessibility

    Science.gov (United States)

    Solano-Flores, Guillermo; Wang, Chao; Kachchaf, Rachel; Soltero-Gonzalez, Lucinda; Nguyen-Le, Khanh

    2014-01-01

    We address valid testing for English language learners (ELLs)--students in the United States who are schooled in English while they are still acquiring English as a second language. Also, we address the need for procedures for systematically developing ELL testing accommodations--changes in tests intended to support ELLs to gain access to the…

  19. On the issue of item selection in computerized adaptive testing with response times

    NARCIS (Netherlands)

    Veldkamp, Bernard P.

    2016-01-01

    Many standardized tests are now administered via computer rather than paper-and-pencil format. The computer-based delivery mode brings with it certain advantages. One advantage is the ability to adapt the difficulty level of the test to the ability level of the test taker in what has been termed

  20. Item-saving assessment of self-care performance in children with developmental disabilities: A prospective caregiver-report computerized adaptive test

    Science.gov (United States)

    Chen, Cheng-Te; Chen, Yu-Lan; Lin, Yu-Ching; Hsieh, Ching-Lin; Tzeng, Jeng-Yi

    2018-01-01

    Objective The purpose of this study was to construct a computerized adaptive test (CAT) for measuring self-care performance (the CAT-SC) in children with developmental disabilities (DD) aged from 6 months to 12 years in a content-inclusive, precise, and efficient fashion. Methods The study was divided into 3 phases: (1) item bank development, (2) item testing, and (3) a simulation study to determine the stopping rules for the administration of the CAT-SC. A total of 215 caregivers of children with DD were interviewed with the 73-item CAT-SC item bank. An item response theory model was adopted for examining the construct validity to estimate item parameters after investigation of the unidimensionality, equality of slope parameters, item fitness, and differential item functioning (DIF). In the last phase, the reliability and concurrent validity of the CAT-SC were evaluated. Results The final CAT-SC item bank contained 56 items. The stopping rules suggested were (a) reliability coefficient greater than 0.9 or (b) 14 items administered. The results of simulation also showed that 85% of the estimated self-care performance scores would reach a reliability higher than 0.9 with a mean test length of 8.5 items, and the mean reliability for the rest was 0.86. Administering the CAT-SC could reduce the number of items administered by 75% to 84%. In addition, self-care performances estimated by the CAT-SC and the full item bank were very similar to each other (Pearson r = 0.98). Conclusion The newly developed CAT-SC can efficiently measure self-care performance in children with DD whose performances are comparable to those of TD children aged from 6 months to 12 years as precisely as the whole item bank. The item bank of the CAT-SC has good reliability and a unidimensional self-care construct, and the CAT can estimate self-care performance with less than 25% of the items in the item bank. Therefore, the CAT-SC could be useful for measuring self-care performance in children with

  1. Results of wholesomeness test on basic plan of research and development of food irradiation (7 items)

    International Nuclear Information System (INIS)

    Furuya, Tsuyoshi

    1989-01-01

    Twenty years have elapsed since the general research on food irradiation was begun in Japan as the new technology for food preservation, and the research on the wholesomeness of irradiated foods has been carried out in wide range together with the research on irradiation effect, irradiation techniques and economical efficiency. The wholesomeness of irradiated foods includes chronic toxicity including carcinogenic property in the continuous intake for long period, the effect to reproduction function over many generations and the possibility of giving hereditary injury to cells, the nutritional adequacy required for the sustenance of life and the increase of health, and microbiological safety. In Japan, the research on food irradiation was designated as an atomic energy specific general research, and as the objects of research, potato and onion for the prevention of germination, rice and wheat for the protection from noxious insects, fish paste products, wienerwurst and mandarin orange for sterilization were selected. For the irradiation, Co-60 gamma ray was used except the case of mandarin orange using electron beam. The research on all 7 items was finished, and the irradiation of potato was permitted. (K.I.)

  2. Specificity and false positive rates of the Test of Memory Malingering, Rey 15-item Test, and Rey Word Recognition Test among forensic inpatients with intellectual disabilities.

    Science.gov (United States)

    Love, Christopher M; Glassmire, David M; Zanolini, Shanna Jordan; Wolf, Amanda

    2014-10-01

    This study evaluated the specificity and false positive (FP) rates of the Rey 15-Item Test (FIT), Word Recognition Test (WRT), and Test of Memory Malingering (TOMM) in a sample of 21 forensic inpatients with mild intellectual disability (ID). The FIT demonstrated an FP rate of 23.8% with the standard quantitative cutoff score. Certain qualitative error types on the FIT showed promise and had low FP rates. The WRT obtained an FP rate of 0.0% with previously reported cutoff scores. Finally, the TOMM demonstrated low FP rates of 4.8% and 0.0% on Trial 2 and the Retention Trial, respectively, when applying the standard cutoff score. FP rates are reported for a range of cutoff scores and compared with published research on individuals diagnosed with ID. Results indicated that although the quantitative variables on the FIT had unacceptably high FP rates, the TOMM and WRT had low FP rates, increasing the confidence clinicians can place in scores reflecting poor effort on these measures during ID evaluations. © The Author(s) 2014.

  3. Measuring Cognitive Load in Test Items: Static Graphics versus Animated Graphics

    Science.gov (United States)

    Dindar, M.; Kabakçi Yurdakul, I.; Inan Dönmez, F.

    2015-01-01

    The majority of multimedia learning studies focus on the use of graphics in learning process but very few of them examine the role of graphics in testing students' knowledge. This study investigates the use of static graphics versus animated graphics in a computer-based English achievement test from a cognitive load theory perspective. Three…

  4. Explanatory Item Response Modeling of Children's Change on a Dynamic Test of Analogical Reasoning

    Science.gov (United States)

    Stevenson, Claire E.; Hickendorff, Marian; Resing, Wilma C. M.; Heiser, Willem J.; de Boeck, Paul A. L.

    2013-01-01

    Dynamic testing is an assessment method in which training is incorporated into the procedure with the aim of gauging cognitive potential. Large individual differences are present in children's ability to profit from training in analogical reasoning. The aim of this experiment was to investigate sources of these differences on a dynamic test of…

  5. How to Reason with Economic Concepts: Cognitive Process of Japanese Undergraduate Students Solving Test Items

    Science.gov (United States)

    Asano, Tadayoshi; Yamaoka, Michio

    2015-01-01

    The authors administered a Japanese version of the Test of Understanding in College Economics, the fourth edition (TUCE-4) to assess the economic literacy of Japanese undergraduate students in 2006 and 2009. These two test results were combined to investigate students' cognitive process or reasoning with specific economic concepts and principles…

  6. Single Event Effects (SEE) Testing: Practical Approach to Test Plans

    Science.gov (United States)

    LaBel, Kenneth A.; Pellish, Jonathan Allen; Berg, Melanie D.

    2014-01-01

    While standards and guidelines for performing SEE testing have existed for several decades, guidance for developing SEE test plans has not been as easy to find. In this presentation, the variety of areas that need to be considered ranging from resource issues (funds, personnel, schedule) to extremely technical challenges (particle interaction and circuit application), shall be discussed. Note: we consider the approach outlined here as a "living" document: Mission-specific constraints and new technology related issues always need to be taken into account.

  7. Single-item measures for depression and anxiety: Validation of the Screening Tool for Psychological Distress in an inpatient cardiology setting.

    Science.gov (United States)

    Young, Quincy-Robyn; Nguyen, Michelle; Roth, Susan; Broadberry, Ann; Mackay, Martha H

    2015-12-01

    Depression and anxiety are common among patients with cardiovascular disease (CVD) and confer significant cardiac risk, contributing to CVD morbidity and mortality. Unfortunately, due to the lack of screening tools that address the specific needs of hospitalized patients, few cardiac inpatient programs offer routine screening for these forms of psychological distress, despite recommendations to do so. The purpose of this study was to validate single-item measures for depression and anxiety among cardiac inpatients. Consecutive inpatients were recruited from the cardiology and cardiac surgery step-down units at a university-affiliated, quaternary-care hospital. Subjects completed a questionnaire that included: (a) demographics, (b) single-item-measures for depression and anxiety (from the Screening Tool for Psychological Distress (STOP-D)), and (c) Hospital Anxiety and Depression Scale (HADS). One hundred and five participants were recruited with a wide variety of cardiac diagnoses, having a mean age of 66 years, and 28% were women. Both STOP-D items were highly correlated with their corresponding validated measures and demonstrated robust receiver-operator characteristic curves. Severity scores on both items correlated well with established severity cut-off scores on the corresponding subscales of the HADS. The STOP-D is a self-administered, self-report measure using two independent items that provide severity scores for depression and anxiety. The tool performs very well compared with other previously validated measures. Requiring no additional scoring and being free, STOP-D offers a simple and valid method for identifying hospitalized cardiac patients who are experiencing psychological distress. This crucial first step triggers initiation of appropriate monitoring and intervention, thus reducing the likelihood of the adverse cardiac outcomes associated with psychological distress. © The European Society of Cardiology 2014.

  8. Dynamic Testing of Analogical Reasoning in 5- to 6-Year-Olds : Multiple-Choice Versus Constructed-Response Training Items

    NARCIS (Netherlands)

    Stevenson, C.E.; Heiser, W.J.; Resing, W.C.M.

    2016-01-01

    Multiple-choice (MC) analogy items are often used in cognitive assessment. However, in dynamic testing, where the aim is to provide insight into potential for learning and the learning process, constructed-response (CR) items may be of benefit. This study investigated whether training with CR or MC

  9. Applying Item Response Theory to the Development of a Screening Adaptation of the Goldman-Fristoe Test of Articulation-Second Edition

    Science.gov (United States)

    Brackenbury, Tim; Zickar, Michael J.; Munson, Benjamin; Storkel, Holly L.

    2017-01-01

    Purpose: Item response theory (IRT) is a psychometric approach to measurement that uses latent trait abilities (e.g., speech sound production skills) to model performance on individual items that vary by difficulty and discrimination. An IRT analysis was applied to preschoolers' productions of the words on the Goldman-Fristoe Test of…

  10. Barriers and benefits to desired behaviors for single use plastic items in northeast Ohio's Lake Erie basin.

    Science.gov (United States)

    Bartolotta, Jill F; Hardy, Scott D

    2018-02-01

    Given the growing saliency of plastic marine debris, and the impact of plastics on beaches and aquatic environments in the Laurentian Great Lakes, applied research is needed to support municipal and nongovernmental campaigns to prevent debris from reaching the water's edge. This study addresses this need by examining the barriers and benefits to positive behavior for two plastic debris items in northeast Ohio's Lake Erie basin: plastic bags and plastic water bottles. An online survey is employed to gather data on the use and disposal of these plastic items and to solicit recommendations on how to positively change behavior to reduce improper disposal. Results support a ban on plastic bags and plastic water bottles, with more enthusiasm for a bag ban. Financial incentives are also seen as an effective way to influence behavior change, as are location-specific solutions focused on education and outreach. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement.

    Science.gov (United States)

    McInnes, Matthew D F; Moher, David; Thombs, Brett D; McGrath, Trevor A; Bossuyt, Patrick M; Clifford, Tammy; Cohen, Jérémie F; Deeks, Jonathan J; Gatsonis, Constantine; Hooft, Lotty; Hunt, Harriet A; Hyde, Christopher J; Korevaar, Daniël A; Leeflang, Mariska M G; Macaskill, Petra; Reitsma, Johannes B; Rodin, Rachel; Rutjes, Anne W S; Salameh, Jean-Paul; Stevens, Adrienne; Takwoingi, Yemisi; Tonelli, Marcello; Weeks, Laura; Whiting, Penny; Willis, Brian H

    2018-01-23

    Systematic reviews of diagnostic test accuracy synthesize data from primary diagnostic studies that have evaluated the accuracy of 1 or more index tests against a reference standard, provide estimates of test performance, allow comparisons of the accuracy of different tests, and facilitate the identification of sources of variability in test accuracy. To develop the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) diagnostic test accuracy guideline as a stand-alone extension of the PRISMA statement. Modifications to the PRISMA statement reflect the specific requirements for reporting of systematic reviews and meta-analyses of diagnostic test accuracy studies and the abstracts for these reviews. Established standards from the Enhancing the Quality and Transparency of Health Research (EQUATOR) Network were followed for the development of the guideline. The original PRISMA statement was used as a framework on which to modify and add items. A group of 24 multidisciplinary experts used a systematic review of articles on existing reporting guidelines and methods, a 3-round Delphi process, a consensus meeting, pilot testing, and iterative refinement to develop the PRISMA diagnostic test accuracy guideline. The final version of the PRISMA diagnostic test accuracy guideline checklist was approved by the group. The systematic review (produced 64 items) and the Delphi process (provided feedback on 7 proposed items; 1 item was later split into 2 items) identified 71 potentially relevant items for consideration. The Delphi process reduced these to 60 items that were discussed at the consensus meeting. Following the meeting, pilot testing and iterative feedback were used to generate the 27-item PRISMA diagnostic test accuracy checklist. To reflect specific or optimal contemporary systematic review methods for diagnostic test accuracy, 8 of the 27 original PRISMA items were left unchanged, 17 were modified, 2 were added, and 2 were omitted. The 27-item

  12. Algorithms for the Construction of Parallel Tests by Zero-One Programming. Project Psychometric Aspects of Item Banking No. 7. Research Report 86-7.

    Science.gov (United States)

    Boekkooi-Timminga, Ellen

    Nine methods for automated test construction are described. All are based on the concepts of information from item response theory. Two general kinds of methods for the construction of parallel tests are presented: (1) sequential test design; and (2) simultaneous test design. Sequential design implies that the tests are constructed one after the…

  13. The 40-item Monell Extended Sniffin' Sticks Identification Test (MONEX-40)

    NARCIS (Netherlands)

    Freiherr, J.; Gordon, A.R.; Alden, E.C.; Ponting, A.L.; Hernandez, M.; Boesveldt, S.; Lundstrom, J.N.

    2012-01-01

    Background Most existing olfactory identification (ID) tests have the primary aim of diagnosing clinical olfactory dysfunction, thereby rendering them sub-optimal for experimental settings where the aim is to detect differences in healthy subjects’ odor ID abilities. Materials and methods We have

  14. Explanatory item response modeling of children's change on a dynamic test of analogical reasoning

    NARCIS (Netherlands)

    Stevenson, C.E.; Hickendorff, M.; Resing, W.C.M.; Heiser, W.J.; de Boeck, P.A.L.

    Dynamic testing is an assessment method in which training is incorporated into the procedure with the aim of gauging cognitive potential. Large individual differences are present in children's ability to profit from training in analogical reasoning. The aim of this experiment was to investigate

  15. Industrial Arts Test Development, Book III. Resource Items for Graphics Technology, Power Technology, Production Technology.

    Science.gov (United States)

    New York State Education Dept., Albany.

    This booklet is designed to assist teachers in developing examinations for classroom use. It is a collection of 955 objective test questions, mostly multiple choice, for industrial arts students in the three areas of graphics technology, power technology, and production technology. Scoring keys are provided. There are no copyright restrictions,…

  16. Psychometric evaluation of the EORTC computerized adaptive test (CAT) fatigue item pool

    DEFF Research Database (Denmark)

    Petersen, Morten Aa; Giesinger, Johannes M; Holzner, Bernhard

    2013-01-01

    Fatigue is one of the most common symptoms associated with cancer and its treatment. To obtain a more precise and flexible measure of fatigue, the EORTC Quality of Life Group has developed a computerized adaptive test (CAT) measure of fatigue. This is part of an ongoing project developing a CAT...

  17. Evaluation of the box and blocks test, stereognosis and item banks of activity and upper extremity function in youths with brachial plexus birth palsy.

    Science.gov (United States)

    Mulcahey, Mary Jane; Kozin, Scott; Merenda, Lisa; Gaughan, John; Tian, Feng; Gogola, Gloria; James, Michelle A; Ni, Pengsheng

    2012-09-01

    One of the greatest limitations to measuring outcomes in pediatric orthopaedics is the lack of effective instruments. Computer adaptive testing, which uses large item banks, select only items that are relevant to a child's function based on a previous response and filters items that are too easy or too hard or simply not relevant to the child. In this way, computer adaptive testing provides for a meaningful, efficient, and precise method to evaluate patient-reported outcomes. Banks of items that assess activity and upper extremity (UE) function have been developed for children with cerebral palsy and have enabled computer adaptive tests that showed strong reliability, strong validity, and broader content range when compared with traditional instruments. Because of the void in instruments for children with brachial plexus birth palsy (BPBP) and the importance of having an UE and activity scale, we were interested in how well these items worked in this population. Cross-sectional, multicenter study involving 200 children with BPBP was conducted. The box and block test (BBT) and Stereognosis tests were administered and patient reports of UE function and activity were obtained with the cerebral palsy item banks. Differential item functioning (DIF) was examined. Predictive ability of the BBT and stereognosis was evaluated with proportional odds logistic regression model. Spearman correlations coefficients (rs) were calculated to examine correlation between stereognosis and the BBT and between individual stereognosis items and the total stereognosis score. Six of the 86 items showed DIF, indicating that the activity and UE item banks may be useful for computer adaptive tests for children with BPBP. The penny and the button were strongest predictors of impairment level (odds ratio=0.34 to 0.40]. There was a good positive relationship between total stereognosis and BBT scores (rs=0.60). The BBT had a good negative (rs=-0.55) and good positive (rs=0.55) relationship with

  18. Using Automated Processes to Generate Test Items And Their Associated Solutions and Rationales to Support Formative Feedback

    Directory of Open Access Journals (Sweden)

    Mark Gierl

    2015-08-01

    Full Text Available Automatic item generation is the process of using item models to produce assessment tasks using computer technology. An item model is similar to a template that highlights the elements in the task that must be manipulated to produce new items. The purpose of our study is to describe an innovative method for generating large numbers of diverse and heterogeneous items along with their solutions and associated rationales to support formative feedback. We demonstrate the method by generating items in two diverse content areas, mathematics and nonverbal reasoning

  19. Single-shell tank riser resistance to ground test plan

    International Nuclear Information System (INIS)

    Kiewert, L.R.

    1996-01-01

    This Test Procedure provides the general directions for conducting Single-Shell Tank Riser to Earth Measurements which will be used by engineering as a step towards providing closure for the Lightning Hazard Issue

  20. The Role of Item Models in Automatic Item Generation

    Science.gov (United States)

    Gierl, Mark J.; Lai, Hollis

    2012-01-01

    Automatic item generation represents a relatively new but rapidly evolving research area where cognitive and psychometric theories are used to produce tests that include items generated using computer technology. Automatic item generation requires two steps. First, test development specialists create item models, which are comparable to templates…

  1. Testing the robustness of deterministic models of optimal dynamic pricing and lot-sizing for deteriorating items under stochastic conditions

    DEFF Research Database (Denmark)

    Ghoreishi, Maryam

    2018-01-01

    Many models within the field of optimal dynamic pricing and lot-sizing models for deteriorating items assume everything is deterministic and develop a differential equation as the core of analysis. Two prominent examples are the papers by Rajan et al. (Manag Sci 38:240–262, 1992) and Abad (Manag......, we will try to expose the model by Abad (1996) and Rajan et al. (1992) to stochastic inputs; however, designing these stochastic inputs such that they as closely as possible are aligned with the assumptions of those papers. We do our investigation through a numerical test where we test the robustness...... of the numerical results reported in Rajan et al. (1992) and Abad (1996) in a simulation model. Our numerical results seem to confirm that the results stated in these papers are indeed robust when being imposed to stochastic inputs....

  2. The effect of Trier Social Stress Test (TSST on item and associative recognition of words and pictures in healthy participants

    Directory of Open Access Journals (Sweden)

    Jonathan eGuez

    2016-04-01

    Full Text Available Psychological stress, induced by the Trier Social Stress Test (TSST, has repeatedly been shown to alter memory performance. Although factors influencing memory performance such as stimulus nature (verbal /pictorial and emotional valence have been extensively studied, results whether stress impairs or improves memory are still inconsistent. This study aimed at exploring the effect of TSST on item versus associative memory for neutral, verbal, and pictorial stimuli. 48 healthy subjects were recruited, 24 participants were randomly assigned to the TSST group and the remaining 24 participants were assigned to the control group. Stress reactivity was measured by psychological (subjective state anxiety ratings and physiological (Galvanic skin response recording measurements. Subjects performed an item-association memory task for both stimulus types (words, pictures simultaneously, before, and after the stress/non-stress manipulation. The results showed that memory recognition for pictorial stimuli was higher than for verbal stimuli. Memory for both words and pictures was impaired following TSST; while the source for this impairment was specific to associative recognition in pictures, a more general deficit was observed for verbal material, as expressed in decreased recognition for both items and associations following TSST. Response latency analysis indicated that the TSST manipulation decreased response time but at the cost of memory accuracy. We conclude that stress does not uniformly affect memory; rather it interacts with the task’s cognitive load and stimulus type. Applying the current study results to patients diagnosed with disorders associated with traumatic stress, our findings in healthy subjects under acute stress provide further support for our assertion that patients’ impaired memory originates in poor recollection processing following depletion of attentional resources.

  3. A single hole tracer test to determine longitudinal dispersion

    International Nuclear Information System (INIS)

    Noy, D.J.; Holmes, D.C.

    1986-03-01

    The paper concerns a single hole tracer test to determine longitudinal dispersion, which is an important parameter in assessing the suitability of a site for radioactive waste disposal. The theory, equipment and procedure for measuring longitudinal dispersion in a single borehole is described. Results are presented for field trials conducted in an aquifer, where the technique produced good results. The measured value of longitudinal dispersion, from a single hole test, relates only to a limited volume of rock immediately adjacent to the borehole. (U.K.)

  4. Passive ultra high frequency radio frequency identification systems for single-item identification in food supply chains

    Directory of Open Access Journals (Sweden)

    Paolo Barge

    2017-02-01

    Full Text Available In the food industry, composition, size, and shape of items are much less regular than in other commodities sectors. In addition, a wide variety of packaging, composed by different materials, is employed. As material, size and shape of items to which the tag should be attached strongly influence the minimum power requested for tag functioning, performance improvements can be achieved only selecting suitable radio frequency (RF identifiers for the specific combination of food product and packaging. When dealing with logistics units, the dynamic reading of a vast number of tags could originate simultaneous broadcasting of signals (tag-to-tag collisions that could affect reading rates and the overall reliability of the identification procedure. This paper reports the results of an analysis of the reading performance of ultra high frequency radio frequency identification systems for multiple static and dynamic electronic identification of food packed products in controlled conditions. Products were considered when arranged on a logistics pallet. The effects on reading rate of different factors, among which the product type, the gate configuration, the field polarisation, the power output of the RF reader, the interrogation protocol configuration as well as the transit speed, the number of tags and their interactions were statistically analysed and compared.

  5. Avanços na psicometria: da Teoria Clássica dos Testes à Teoria de Resposta ao Item

    Directory of Open Access Journals (Sweden)

    Laisa Marcorela Andreoli Sartes

    2013-01-01

    Full Text Available No século XX, o desenvolvimento e avaliação das propriedades psicométricas dos testes se embasou principalmente na Teoria Clássica dos Testes (TCT. Muitos testes são longos e redundantes, com medidas influenciáveis pelas características da amostra dos indivíduos avaliados durante seu desenvolvimento, sendo algumas destas limitações consequências do uso da TCT. A Teoria de Resposta ao Item (TRI surgiu como uma possível solução para algumas limitações da TCT, melhorando a qualidade da avaliação da estrutura dos testes. Neste texto comparamos criticamente as características da TCT e da TRI como métodos para avaliação das propriedades psicométricas dos testes. São discutidas as vantagens e limitações de cada método.

  6. Using Item Response Theory to Describe the Nonverbal Literacy Assessment (NVLA)

    Science.gov (United States)

    Fleming, Danielle; Wilson, Mark; Ahlgrim-Delzell, Lynn

    2018-01-01

    The Nonverbal Literacy Assessment (NVLA) is a literacy assessment designed for students with significant intellectual disabilities. The 218-item test was initially examined using confirmatory factor analysis. This method showed that the test worked as expected, but the items loaded onto a single factor. This article uses item response theory to…

  7. 49 CFR 232.305 - Single car air brake tests.

    Science.gov (United States)

    2010-10-01

    ... from a train or when placed on a shop or repair track, as defined in § 232.303(a); (2) A car is on a shop or repair track, as defined in § 232.303(a), for any reason and has not received a single car air... 49 Transportation 4 2010-10-01 2010-10-01 false Single car air brake tests. 232.305 Section 232...

  8. Changes in Word Usage Frequency May Hamper Intergenerational Comparisons of Vocabulary Skills: An Ngram Analysis of Wordsum, WAIS, and WISC Test Items

    Science.gov (United States)

    Roivainen, Eka

    2014-01-01

    Research on secular trends in mean intelligence test scores shows smaller gains in vocabulary skills than in nonverbal reasoning. One possible explanation is that vocabulary test items become outdated faster compared to nonverbal tasks. The history of the usage frequency of the words on five popular vocabulary tests, the GSS Wordsum, Wechsler…

  9. A Comparison of Item Selection Procedures Using Different Ability Estimation Methods in Computerized Adaptive Testing Based on the Generalized Partial Credit Model

    Science.gov (United States)

    Ho, Tsung-Han

    2010-01-01

    Computerized adaptive testing (CAT) provides a highly efficient alternative to the paper-and-pencil test. By selecting items that match examinees' ability levels, CAT not only can shorten test length and administration time but it can also increase measurement precision and reduce measurement error. In CAT, maximum information (MI) is the most…

  10. easyCBM CCSS Math Item Scaling and Test Form Revision (2012-2013): Grades 6-8. Technical Report #1313

    Science.gov (United States)

    Anderson, Daniel; Alonzo, Julie; Tindal, Gerald

    2012-01-01

    The purpose of this technical report is to document the piloting and scaling of new easyCBM mathematics test items aligned with the Common Core State Standards (CCSS) and to describe the process used to revise and supplement the 2012 research version easyCBM CCSS math tests in Grades 6-8. For all operational 2012 research version test forms (10…

  11. The Impact of Partial Measurement Invariance on Testing Moderation for Single and Multi-Level Data

    Directory of Open Access Journals (Sweden)

    Yu-Yu Hsiao

    2018-05-01

    Full Text Available Moderation effect is a commonly used concept in the field of social and behavioral science. Several studies regarding the implication of moderation effects have been done; however, little is known about how partial measurement invariance influences the properties of tests for moderation effects when categorical moderators were used. Additionally, whether the impact is the same across single and multilevel data is still unknown. Hence, the purpose of the present study is twofold: (a To investigate the performance of the moderation test in single-level studies when measurement invariance does not hold; (b To examine whether unique features of multilevel data, such as intraclass correlation (ICC and number of clusters, influence the effect of measurement non-invariance on the performance of tests for moderation. Simulation results indicated that falsely assuming measurement invariance lead to biased estimates, inflated Type I error rates, and more gain or more loss in power (depends on simulation conditions for the test of moderation effects. Such patterns were more salient as sample size and the number of non-invariant items increase for both single- and multi-level data. With multilevel data, the cluster size seemed to have a larger impact than the number of clusters when falsely assuming measurement invariance in the moderation estimation. ICC was trivially related to the moderation estimates. Overall, when testing moderation effects with categorical moderators, employing a model that accounts for the measurement (noninvariance structure of the predictor and/or the outcome is recommended.

  12. The Impact of Partial Measurement Invariance on Testing Moderation for Single and Multi-Level Data.

    Science.gov (United States)

    Hsiao, Yu-Yu; Lai, Mark H C

    2018-01-01

    Moderation effect is a commonly used concept in the field of social and behavioral science. Several studies regarding the implication of moderation effects have been done; however, little is known about how partial measurement invariance influences the properties of tests for moderation effects when categorical moderators were used. Additionally, whether the impact is the same across single and multilevel data is still unknown. Hence, the purpose of the present study is twofold: (a) To investigate the performance of the moderation test in single-level studies when measurement invariance does not hold; (b) To examine whether unique features of multilevel data, such as intraclass correlation (ICC) and number of clusters, influence the effect of measurement non-invariance on the performance of tests for moderation. Simulation results indicated that falsely assuming measurement invariance lead to biased estimates, inflated Type I error rates, and more gain or more loss in power (depends on simulation conditions) for the test of moderation effects. Such patterns were more salient as sample size and the number of non-invariant items increase for both single- and multi-level data. With multilevel data, the cluster size seemed to have a larger impact than the number of clusters when falsely assuming measurement invariance in the moderation estimation. ICC was trivially related to the moderation estimates. Overall, when testing moderation effects with categorical moderators, employing a model that accounts for the measurement (non)invariance structure of the predictor and/or the outcome is recommended.

  13. Diagnostic Value of Subjective Memory Complaints Assessed with a Single Item in Dominantly Inherited Alzheimer’s Disease: Results of the DIAN Study

    Directory of Open Access Journals (Sweden)

    Christoph Laske

    2015-01-01

    Full Text Available Objective. We examined the diagnostic value of subjective memory complaints (SMCs assessed with a single item in a large cross-sectional cohort consisting of families with autosomal dominant Alzheimer’s disease (ADAD participating in the Dominantly Inherited Alzheimer Network (DIAN. Methods. The baseline sample of 183 mutation carriers (MCs and 117 noncarriers (NCs was divided according to Clinical Dementia Rating (CDR scale into preclinical (CDR 0; MCs: n=107; NCs: n=109, early symptomatic (CDR 0.5; MCs: n=48; NCs: n=8, and dementia stage (CDR ≥ 1; MCs: n=28; NCs: n=0. These groups were subdivided by the presence or absence of SMCs. Results. At CDR 0, SMCs were present in 12.1% of MCs and 9.2% of NCs (P=0.6. At CDR 0.5, SMCs were present in 66.7% of MCs and 62.5% of NCs (P=1.0. At CDR ≥ 1, SMCs were present in 96.4% of MCs. SMCs in MCs were significantly associated with CDR, logical memory scores, Geriatric Depression Scale, education, and estimated years to onset. Conclusions. The present study shows that SMCs assessed by a single-item scale have no diagnostic value to identify preclinical ADAD in asymptomatic individuals. These results demonstrate the need of further improvement of SMC measures that should be examined in large clinical trials.

  14. Development of a psychological test to measure ability-based emotional intelligence in the Indonesian workplace using an item response theory

    Directory of Open Access Journals (Sweden)

    Fajrianthi

    2017-11-01

    Full Text Available Fajrianthi,1 Rizqy Amelia Zein2 1Department of Industrial and Organizational Psychology, 2Department of Personality and Social Psychology, Faculty of Psychology, Universitas Airlangga, Surabaya, East Java, Indonesia Abstract: This study aimed to develop an emotional intelligence (EI test that is suitable to the Indonesian workplace context. Airlangga Emotional Intelligence Test (Tes Kecerdasan Emosi Airlangga [TKEA] was designed to measure three EI domains: 1 emotional appraisal, 2 emotional recognition, and 3 emotional regulation. TKEA consisted of 120 items with 40 items for each subset. TKEA was developed based on the Situational Judgment Test (SJT approach. To ensure its psychometric qualities, categorical confirmatory factor analysis (CCFA and item response theory (IRT were applied to test its validity and reliability. The study was conducted on 752 participants, and the results showed that test information function (TIF was 3.414 (ability level = 0 for subset 1, 12.183 for subset 2 (ability level = -2, and 2.398 for subset 3 (level of ability = -2. It is concluded that TKEA performs very well to measure individuals with a low level of EI ability. It is worth to note that TKEA is currently at the development stage; therefore, in this study, we investigated TKEA’s item analysis and dimensionality test of each TKEA subset. Keywords: categorical confirmatory factor analysis, emotional intelligence, item response theory 

  15. Item Banking with Embedded Standards

    Science.gov (United States)

    MacCann, Robert G.; Stanley, Gordon

    2009-01-01

    An item banking method that does not use Item Response Theory (IRT) is described. This method provides a comparable grading system across schools that would be suitable for low-stakes testing. It uses the Angoff standard-setting method to obtain item ratings that are stored with each item. An example of such a grading system is given, showing how…

  16. Hybrid Testing of Composite Structures with Single-Axis Control

    DEFF Research Database (Denmark)

    Waldbjørn, Jacob Paamand; Høgh, Jacob Herold; Stang, Henrik

    2013-01-01

    Correlation (DIC) is therefore implemented for displacement control of the experimental setup. The hybrid testing setup was verified on a multicomponent structure consisting of a beam loaded in three point bending and a numerical structure of a frame. Furthermore, the stability of the hybrid testing loop......Hybrid testing is a substructuring technique where a structure is emulated by modelling a part of it in a numerical model while testing the remainder experimentally. Previous research in hybrid testing has been performed on multi-component structures e.g. damping fixtures, however in this paper...... a hybrid testing platform is introduced for single-component hybrid testing. In this case, the boundary between the numerical model and experimental setup is defined by multiple Degrees-Of-Freedoms (DOFs) which highly complicate the transferring of response between the two substructures. Digital Image...

  17. Primary care validation of a single-question alcohol screening test.

    Science.gov (United States)

    Smith, Peter C; Schmidt, Susan M; Allensworth-Davies, Donald; Saitz, Richard

    2009-07-01

    Unhealthy alcohol use is prevalent but under-diagnosed in primary care settings. To validate, in primary care, a single-item screening test for unhealthy alcohol use recommended by the National Institute on Alcohol Abuse and Alcoholism (NIAAA). Cross-sectional study. Adult English-speaking patients recruited from primary care waiting rooms. Participants were asked the single screening question, “How many times in the past year have you had X or more drinks in a day?”, where X is 5 for men and 4 for women, and a response of 1 or greater [corrected] is considered positive. Unhealthy alcohol use was defined as the presence of an alcohol use disorder, as determined by a standardized diagnostic interview, or risky consumption, as determined using a validated 30-day calendar method. Of 394 eligible primary care patients, 286 (73%) completed the interview. The single-question screen was 81.8% sensitive (95% confidence interval (CI) 72.5% to 88.5%) and 79.3% specific (95% CI 73.1% to 84.4%) for the detection of unhealthy alcohol use. It was slightly more sensitive (87.9%, 95% CI 72.7% to 95.2%) but was less specific (66.8%, 95% CI 60.8% to 72.3%) for the detection of a current alcohol use disorder. Test characteristics were similar to that of a commonly used three-item screen, and were affected very little by subject demographic characteristics. The single screening question recommended by the NIAAA accurately identified unhealthy alcohol use in this sample of primary care patients. These findings support the use of this brief screen in primary care.

  18. European accelerator facilities for single event effects testing

    Energy Technology Data Exchange (ETDEWEB)

    Adams, L; Nickson, R; Harboe-Sorensen, R [ESA-ESTEC, Noordwijk (Netherlands); Hajdas, W; Berger, G

    1997-03-01

    Single event effects are an important hazard to spacecraft and payloads. The advances in component technology, with shrinking dimensions and increasing complexity will give even more importance to single event effects in the future. The ground test facilities are complex and expensive and the complexities of installing a facility are compounded by the requirement that maximum control is to be exercised by users largely unfamiliar with accelerator technology. The PIF and the HIF are the result of experience gained in the field of single event effects testing and represent a unique collaboration between space technology and accelerator experts. Both facilities form an essential part of the European infrastructure supporting space projects. (J.P.N.)

  19. Development and Application of Methods for Estimating Operating Characteristics of Discrete Test Item Responses without Assuming any Mathematical Form.

    Science.gov (United States)

    Samejima, Fumiko

    In latent trait theory the latent space, or space of the hypothetical construct, is usually represented by some unidimensional or multi-dimensional continuum of real numbers. Like the latent space, the item response can either be treated as a discrete variable or as a continuous variable. Latent trait theory relates the item response to the latent…

  20. Tests of the single-pion exchange model

    International Nuclear Information System (INIS)

    Treiman, S.B.; Yang, C.N.

    1983-01-01

    The single-pion exchange model (SPEM) of high-energy particle reactions provides an attractively simple picture of seemingly complex processes and has accordingly been much discussed in recent times. The purpose of this note is to call attention to the possibility of subjecting the model to certain tests precisely in the domain where the model stands the best chance of making sense

  1. Fitting a Mixture Rasch Model to English as a Foreign Language Listening Tests: The Role of Cognitive and Background Variables in Explaining Latent Differential Item Functioning

    Science.gov (United States)

    Aryadoust, Vahid

    2015-01-01

    The present study uses a mixture Rasch model to examine latent differential item functioning in English as a foreign language listening tests. Participants (n = 250) took a listening and lexico-grammatical test and completed the metacognitive awareness listening questionnaire comprising problem solving (PS), planning and evaluation (PE), mental…

  2. Automated Scoring of Short-Answer Open-Ended GRE® Subject Test Items. ETS GRE® Board Research Report No. 04-02. ETS RR-08-20

    Science.gov (United States)

    Attali, Yigal; Powers, Don; Freedman, Marshall; Harrison, Marissa; Obetz, Susan

    2008-01-01

    This report describes the development, administration, and scoring of open-ended variants of GRE® Subject Test items in biology and psychology. These questions were administered in a Web-based experiment to registered examinees of the respective Subject Tests. The questions required a short answer of 1-3 sentences, and responses were automatically…

  3. A validated model for the 22-item Sino-Nasal Outcome Test subdomain structure in chronic rhinosinusitis.

    Science.gov (United States)

    Feng, Allen L; Wesely, Nicholas C; Hoehle, Lloyd P; Phillips, Katie M; Yamasaki, Alisa; Campbell, Adam P; Gregorio, Luciano L; Killeen, Thomas E; Caradonna, David S; Meier, Josh C; Gray, Stacey T; Sedaghat, Ahmad R

    2017-12-01

    Previous studies have identified subdomains of the 22-item Sino-Nasal Outcome Test (SNOT-22), reflecting distinct and largely independent categories of chronic rhinosinusitis (CRS) symptoms. However, no study has validated the subdomain structure of the SNOT-22. This study aims to validate the existence of underlying symptom subdomains of the SNOT-22 using confirmatory factor analysis (CFA) and to develop a subdomain model that practitioners and researchers can use to describe CRS symptomatology. A total of 800 patients with CRS were included into this cross-sectional study (400 CRS patients from Boston, MA, and 400 CRS patients from Reno, NV). Their SNOT-22 responses were analyzed using exploratory factor analysis (EFA) to determine the number of symptom subdomains. A CFA was performed to develop a validated measurement model for the underlying SNOT-22 subdomains along with various tests of validity and goodness of fit. EFA demonstrated 4 distinct factors reflecting: sleep, nasal, otologic/facial pain, and emotional symptoms (Cronbach's alpha, >0.7; Bartlett's test of sphericity, p Kaiser-Meyer-Olkin >0.90), independent of geographic locale. The corresponding CFA measurement model demonstrated excellent measures of fit (root mean square error of approximation, 0.95; Tucker-Lewis index, >0.95) and measures of construct validity (heterotrait-monotrait [HTMT] ratio, 0.7), again independent of geographic locale. The use of the 4-subdomain structure for SNOT-22 (reflecting sleep, nasal, otologic/facial pain, and emotional symptoms of CRS) was validated as the most appropriate to calculate SNOT-22 subdomain scores for patients from different geographic regions using CFA. © 2017 ARS-AAOA, LLC.

  4. Creation and validation of the barriers to alcohol reduction (BAR) scale using classical test theory and item response theory.

    Science.gov (United States)

    Kunicki, Zachary J; Schick, Melissa R; Spillane, Nichea S; Harlow, Lisa L

    2018-06-01

    Those who binge drink are at increased risk for alcohol-related consequences when compared to non-binge drinkers. Research shows individuals may face barriers to reducing their drinking behavior, but few measures exist to assess these barriers. This study created and validated the Barriers to Alcohol Reduction (BAR) scale. Participants were college students ( n  = 230) who endorsed at least one instance of past-month binge drinking (4+ drinks for women or 5+ drinks for men). Using classical test theory, exploratory structural equation modeling found a two-factor structure of personal/psychosocial barriers and perceived program barriers. The sub-factors, and full scale had reasonable internal consistency (i.e., coefficient omega = 0.78 (personal/psychosocial), 0.82 (program barriers), and 0.83 (full measure)). The BAR also showed evidence for convergent validity with the Brief Young Adult Alcohol Consequences Questionnaire ( r  = 0.39, p  Theory (IRT) analysis showed the two factors separately met the unidimensionality assumption, and provided further evidence for severity of the items on the two factors. Results suggest that the BAR measure appears reliable and valid for use in an undergraduate student population of binge drinkers. Future studies may want to re-examine this measure in a more diverse sample.

  5. Lord-Wingersky Algorithm Version 2.0 for Hierarchical Item Factor Models with Applications in Test Scoring, Scale Alignment, and Model Fit Testing.

    Science.gov (United States)

    Cai, Li

    2015-06-01

    Lord and Wingersky's (Appl Psychol Meas 8:453-461, 1984) recursive algorithm for creating summed score based likelihoods and posteriors has a proven track record in unidimensional item response theory (IRT) applications. Extending the recursive algorithm to handle multidimensionality is relatively simple, especially with fixed quadrature because the recursions can be defined on a grid formed by direct products of quadrature points. However, the increase in computational burden remains exponential in the number of dimensions, making the implementation of the recursive algorithm cumbersome for truly high-dimensional models. In this paper, a dimension reduction method that is specific to the Lord-Wingersky recursions is developed. This method can take advantage of the restrictions implied by hierarchical item factor models, e.g., the bifactor model, the testlet model, or the two-tier model, such that a version of the Lord-Wingersky recursive algorithm can operate on a dramatically reduced set of quadrature points. For instance, in a bifactor model, the dimension of integration is always equal to 2, regardless of the number of factors. The new algorithm not only provides an effective mechanism to produce summed score to IRT scaled score translation tables properly adjusted for residual dependence, but leads to new applications in test scoring, linking, and model fit checking as well. Simulated and empirical examples are used to illustrate the new applications.

  6. Development of a psychological test to measure ability-based emotional intelligence in the Indonesian workplace using an item response theory.

    Science.gov (United States)

    Fajrianthi; Zein, Rizqy Amelia

    2017-01-01

    This study aimed to develop an emotional intelligence (EI) test that is suitable to the Indonesian workplace context. Airlangga Emotional Intelligence Test (Tes Kecerdasan Emosi Airlangga [TKEA]) was designed to measure three EI domains: 1) emotional appraisal, 2) emotional recognition, and 3) emotional regulation. TKEA consisted of 120 items with 40 items for each subset. TKEA was developed based on the Situational Judgment Test (SJT) approach. To ensure its psychometric qualities, categorical confirmatory factor analysis (CCFA) and item response theory (IRT) were applied to test its validity and reliability. The study was conducted on 752 participants, and the results showed that test information function (TIF) was 3.414 (ability level = 0) for subset 1, 12.183 for subset 2 (ability level = -2), and 2.398 for subset 3 (level of ability = -2). It is concluded that TKEA performs very well to measure individuals with a low level of EI ability. It is worth to note that TKEA is currently at the development stage; therefore, in this study, we investigated TKEA's item analysis and dimensionality test of each TKEA subset.

  7. A single-question screening test for drug use in primary care.

    Science.gov (United States)

    Smith, Peter C; Schmidt, Susan M; Allensworth-Davies, Donald; Saitz, Richard

    2010-07-12

    Drug use (illicit drug use and nonmedical use of prescription drugs) is common but underrecognized in primary care settings. We validated a single-question screening test for drug use and drug use disorders in primary care. Adult patients recruited from primary care waiting rooms were asked the single screening question, "How many times in the past year have you used an illegal drug or used a prescription medication for nonmedical reasons?" A response of at least 1 time was considered positive for drug use. They were also asked the 10-item Drug Abuse Screening Test (DAST-10). The reference standard was the presence or absence of current (past year) drug use or a drug use disorder (abuse or dependence) as determined by a standardized diagnostic interview. Drug use was also determined by oral fluid testing for common drugs of abuse. Of 394 eligible primary care patients, 286 (73%) completed the interview. The single screening question was 100% sensitive (95% confidence interval [CI], 90.6%-100%) and 73.5% specific (95% CI, 67.7%-78.6%) for the detection of a drug use disorder. It was less sensitive for the detection of self-reported current drug use (92.9%; 95% CI, 86.1%-96.5%) and drug use detected by oral fluid testing or self-report (81.8%; 95% CI, 72.5%-88.5%). Test characteristics were similar to those of the DAST-10 and were affected very little by participant demographic characteristics. The single screening question accurately identified drug use in this sample of primary care patients, supporting the usefulness of this brief screen in primary care.

  8. Earthquake acceleration amplification based on single microtremor test

    Science.gov (United States)

    Jaya Syahbana, Arifan; Kurniawan, Rahmat; Soebowo, Eko

    2018-02-01

    Understanding soil dynamics is needed to understand soil behaviour, including the parameters of earthquake acceleration amplification. Many researchers now conduct single microtremor tests to obtain amplification of velocity and natural periods of soil at test sites. However, these amplification parameters are rarely used, so a method is needed to convert the velocity amplification to acceleration amplification. This paper will discuss the proposed process of changing the value of amplification. The proposed method is to integrate the time histories of the synthetic earthquake acceleration of the soil surface under the deaggregation at that location so the time histories of the velocity earthquake will be obtained. Next is to conduct a “fitting curve” between amplification by a single microtremor test with amplification of the synthetic earthquake velocity time histories. After obtaining the fitting curve time histories of velocity, differentiation will be conducted to obtain fitting curve acceleration time histories. The final step after obtaining the fitting curve is to compare the acceleration of the “fitting curve” against the histories time of the acceleration of synthetic earthquake at bedrocks to obtain single microtremor acceleration amplification factor.

  9. North Star Ambulatory Assessment, 6-minute walk test and timed items in ambulant boys with Duchenne muscular dystrophy.

    Science.gov (United States)

    Mazzone, Elena; Martinelli, Diego; Berardinelli, Angela; Messina, Sonia; D'Amico, Adele; Vasco, Gessica; Main, Marion; Doglio, Luca; Politano, Luisa; Cavallaro, Filippo; Frosini, Silvia; Bello, Luca; Carlesi, Adelina; Bonetti, Anna Maria; Zucchini, Elisabetta; De Sanctis, Roberto; Scutifero, Marianna; Bianco, Flaviana; Rossi, Francesca; Motta, Maria Chiara; Sacco, Annalisa; Donati, Maria Alice; Mongini, Tiziana; Pini, Antonella; Battini, Roberta; Pegoraro, Elena; Pane, Marika; Pasquini, Elisabetta; Bruno, Claudio; Vita, Giuseppe; de Waure, Chiara; Bertini, Enrico; Mercuri, Eugenio

    2010-11-01

    The North Star Ambulatory Assessment is a functional scale specifically designed for ambulant boys affected by Duchenne muscular dystrophy (DMD). Recently the 6-minute walk test has also been used as an outcome measure in trials in DMD. The aim of our study was to assess a large cohort of ambulant boys affected by DMD using both North Star Assessment and 6-minute walk test. More specifically, we wished to establish the spectrum of findings for each measure and their correlation. This is a prospective multicentric study involving 10 centers. The cohort included 112 ambulant DMD boys of age ranging between 4.10 and 17 years (mean 8.18±2.3 DS). Ninety-one of the 112 were on steroids: 37/91 on intermittent and 54/91 on daily regimen. The scores on the North Star assessment ranged from 6/34 to 34/34. The distance on the 6-minute walk test ranged from 127 to 560.6 m. The time to walk 10 m was between 3 and 15 s. The time to rise from the floor ranged from 1 to 27.5 s. Some patients were unable to rise from the floor. As expected the results changed with age and were overall better in children treated with daily steroids. The North Star assessment had a moderate to good correlation with 6-minute walk test and with timed rising from floor but less with 10 m timed walk/run test. The 6-minute walk test in contrast had better correlation with 10 m timed walk/run test than with timed rising from floor. These findings suggest that a combination of these outcome measures can be effectively used in ambulant DMD boys and will provide information on different aspects of motor function, that may not be captured using a single measure. Copyright © 2010. Published by Elsevier B.V.

  10. Development of a Mechanical Engineering Test Item Bank to promote learning outcomes-based education in Japanese and Indonesian higher education institutions

    Directory of Open Access Journals (Sweden)

    Jeffrey S. Cross

    2017-11-01

    Full Text Available Following on the 2008-2012 OECD Assessment of Higher Education Learning Outcomes (AHELO feasibility study of civil engineering, in Japan a mechanical engineering learning outcomes assessment working group was established within the National Institute of Education Research (NIER, which became the Tuning National Center for Japan. The purpose of the project is to develop among engineering faculty members, common understandings of engineering learning outcomes, through the collaborative process of test item development, scoring, and sharing of results. By substantiating abstract level learning outcomes into concrete level learning outcomes that are attainable and assessable, and through measuring and comparing the students’ achievement of learning outcomes, it is anticipated that faculty members will be able to draw practical implications for educational improvement at the program and course levels. The development of a mechanical engineering test item bank began with test item development workshops, which led to a series of trial tests, and then to a large scale test implementation in 2016 of 348 first semester master’s students in 9 institutions in Japan, using both multiple choice questions designed to measure the mastery of basic and engineering sciences, and a constructive response task designed to measure “how well students can think like an engineer.” The same set of test items were translated from Japanese into to English and Indonesian, and used to measure achievement of learning outcomes at Indonesia’s Institut Teknologi Bandung (ITB on 37 rising fourth year undergraduate students. This paper highlights how learning outcomes assessment can effectively facilitate learning outcomes-based education, by documenting the experience of Japanese and Indonesian mechanical engineering faculty members engaged in the NIER Test Item Bank project.First published online: 30 November 2017

  11. The Italian version of the 16-item prodromal questionnaire (iPQ-16): Field-test and psychometric features.

    Science.gov (United States)

    Lorenzo, Pelizza; Silvia, Azzali; Federica, Paterlini; Sara, Garlassi; Ilaria, Scazza; Pupo, Simona; Andrea, Raballo

    2018-03-20

    Among current early screeners for psychosis-risk states, the Prodromal Questionnaire-16 items (PQ-16) is often used. We aimed to assess validity and reliability of the Italian version of the PQ-16 in a young adult help-seeking population. We included 154 individuals aged 18-35years seeking help at the Reggio Emilia outpatient mental health services in a large semirural catchment area (550.000 inhabitants). Participants completed the Italian version of the PQ-16 (iPQ-16) and were subsequently evaluated with the Comprehensive Assessment of At-Risk Mental States (CAARMS). We examined diagnostic accuracy (i.e. specificity, sensitivity, negative and positive likelihood ratios, and negative and positive predictive values) and content, convergent, and concurrent validity between PQ-16 and CAARMS using Cronbach's alpha, Spearman's rho, and Cohen's kappa, respectively. We also tested the validity of the adopted PQ-16 cut-offs through Receiver Operating Characteristic (ROC) curves plotted against CAARMS diagnoses and the 1-year predictive validity of the PQ-16. The iPQ-16 showed high internal consistency and acceptable diagnostic accuracy and concurrent validity. ROC analyses pointed to a cut-off score of ≥5 as best cut-off. After 12months of follow-up, 8.7% of participants with a PQ-16 symptom total score of ≥5 who were below the CAARMS psychosis threshold at the baseline, developed a psychotic disorder. Psychometric properties of the iPQ-16 were satisfactory. Copyright © 2018. Published by Elsevier B.V.

  12. The Differences among Three-, Four-, and Five-Option-Item Formats in the Context of a High-Stakes English-Language Listening Test

    Science.gov (United States)

    Lee, HyeSun; Winke, Paula

    2013-01-01

    We adapted three practice College Scholastic Ability Tests (CSAT) of English listening, each with five-option items, to create four- and three-option versions by asking 73 Korean speakers or learners of English to eliminate the least plausible options in two rounds. Two hundred and sixty-four Korean high school English-language learners formed…

  13. An Item Response Theory-Based, Computerized Adaptive Testing Version of the MacArthur-Bates Communicative Development Inventory: Words & Sentences (CDI:WS)

    Science.gov (United States)

    Makransky, Guido; Dale, Philip S.; Havmose, Philip; Bleses, Dorthe

    2016-01-01

    Purpose: This study investigated the feasibility and potential validity of an item response theory (IRT)-based computerized adaptive testing (CAT) version of the MacArthur-Bates Communicative Development Inventory: Words & Sentences (CDI:WS; Fenson et al., 2007) vocabulary checklist, with the objective of reducing length while maintaining…

  14. Testing for one Generalized Linear Single Order Parameter

    DEFF Research Database (Denmark)

    Ellegaard, Niels Langager; Christensen, Tage Emil; Dyre, Jeppe

    We examine a linear single order parameter model for thermoviscoelastic relaxation in viscous liquids, allowing for a distribution of relaxation times. In this model the relaxation of volume and entalpy is completely described by the relaxation of one internal order parameter. In contrast to prior...... work the order parameter may be chosen to have a non-exponential relaxation. The model predictions contradict the general consensus of the properties of viscous liquids in two ways: (i) The model predicts that following a linear isobaric temperature step, the normalized volume and entalpy relaxation...... responses or extrapolate from measurements of a glassy state away from equilibrium. Starting from a master equation description of inherent dynamics, we calculate the complex thermodynamic response functions. We device a way of testing for the generalized single order parameter model by measuring 3 complex...

  15. Using classical test theory, item response theory, and Rasch measurement theory to evaluate patient-reported outcome measures: a comparison of worked examples.

    Science.gov (United States)

    Petrillo, Jennifer; Cano, Stefan J; McLeod, Lori D; Coon, Cheryl D

    2015-01-01

    To provide comparisons and a worked example of item- and scale-level evaluations based on three psychometric methods used in patient-reported outcome development-classical test theory (CTT), item response theory (IRT), and Rasch measurement theory (RMT)-in an analysis of the National Eye Institute Visual Functioning Questionnaire (VFQ-25). Baseline VFQ-25 data from 240 participants with diabetic macular edema from a randomized, double-masked, multicenter clinical trial were used to evaluate the VFQ at the total score level. CTT, RMT, and IRT evaluations were conducted, and results were assessed in a head-to-head comparison. Results were similar across the three methods, with IRT and RMT providing more detailed diagnostic information on how to improve the scale. CTT led to the identification of two problematic items that threaten the validity of the overall scale score, sets of redundant items, and skewed response categories. IRT and RMT additionally identified poor fit for one item, many locally dependent items, poor targeting, and disordering of over half the response categories. Selection of a psychometric approach depends on many factors. Researchers should justify their evaluation method and consider the intended audience. If the instrument is being developed for descriptive purposes and on a restricted budget, a cursory examination of the CTT-based psychometric properties may be all that is possible. In a high-stakes situation, such as the development of a patient-reported outcome instrument for consideration in pharmaceutical labeling, however, a thorough psychometric evaluation including IRT or RMT should be considered, with final item-level decisions made on the basis of both quantitative and qualitative results. Copyright © 2015. Published by Elsevier Inc.

  16. Gender-Based Differential Item Performance in Mathematics Achievement Items.

    Science.gov (United States)

    Doolittle, Allen E.; Cleary, T. Anne

    1987-01-01

    Eight randomly equivalent samples of high school seniors were each given a unique form of the ACT Assessment Mathematics Usage Test (ACTM). Signed measures of differential item performance (DIP) were obtained for each item in the eight ACTM forms. DIP estimates were analyzed and a significant item category effect was found. (Author/LMO)

  17. Differential Item Functioning Analysis Using a Mixture 3-Parameter Logistic Model with a Covariate on the TIMSS 2007 Mathematics Test

    Science.gov (United States)

    Choi, Youn-Jeng; Alexeev, Natalia; Cohen, Allan S.

    2015-01-01

    The purpose of this study was to explore what may be contributing to differences in performance in mathematics on the Trends in International Mathematics and Science Study 2007. This was done by using a mixture item response theory modeling approach to first detect latent classes in the data and then to examine differences in performance on items…

  18. Single specimen fracture toughness determination procedure using instrumented impact test

    International Nuclear Information System (INIS)

    Rintamaa, R.

    1993-04-01

    In the study a new single specimen test method and testing facility for evaluating dynamic fracture toughness has been developed. The method is based on the application of a new pendulum type instrumented impact tester equipped with and optical crack mouth opening displacement (COD) extensometer. The fracture toughness measurement technique uses the Double Displacement Ratio (DDR) method, which is based on the assumption that the specimen is deformed as two rigid arms that rotate around an apparent centre of rotation. This apparent moves as the crack grows, and the ratio of COD versus specimen displacement changes. As a consequence the onset ductile crack initiation can be detected on the load-displacement curve. Thus, an energy-based fracture toughness can be calculated. In addition the testing apparatus can use specimens with the Double ligament size as compared with the standard Charpy specimen which makes the impact testing more appropriate from the fracture mechanics point of view. The novel features of the testing facility and the feasibility of the new DDR method has been verified by performing an extensive experimental and analytical study. (99 refs., 91 figs., 27 tabs.)

  19. Item validity vs. item discrimination index: a redundancy?

    Science.gov (United States)

    Panjaitan, R. L.; Irawati, R.; Sujana, A.; Hanifah, N.; Djuanda, D.

    2018-03-01

    In several literatures about evaluation and test analysis, it is common to find that there are calculations of item validity as well as item discrimination index (D) with different formula for each. Meanwhile, other resources said that item discrimination index could be obtained by calculating the correlation between the testee’s score in a particular item and the testee’s score on the overall test, which is actually the same concept as item validity. Some research reports, especially undergraduate theses tend to include both item validity and item discrimination index in the instrument analysis. It seems that these concepts might overlap for both reflect the test quality on measuring the examinees’ ability. In this paper, examples of some results of data processing on item validity and item discrimination index were compared. It would be discussed whether item validity and item discrimination index can be represented by one of them only or it should be better to present both calculations for simple test analysis, especially in undergraduate theses where test analyses were included.

  20. Simple test system for single molecule recognition force microscopy

    International Nuclear Information System (INIS)

    Riener, Christian K.; Stroh, Cordula M.; Ebner, Andreas; Klampfl, Christian; Gall, Alex A.; Romanin, Christoph; Lyubchenko, Yuri L.; Hinterdorfer, Peter; Gruber, Hermann J.

    2003-01-01

    We have established an easy-to-use test system for detecting receptor-ligand interactions on the single molecule level using atomic force microscopy (AFM). For this, avidin-biotin, probably the best characterized receptor-ligand pair, was chosen. AFM sensors were prepared containing tethered biotin molecules at sufficiently low surface concentrations appropriate for single molecule studies. A biotin tether, consisting of a 6 nm poly(ethylene glycol) (PEG) chain and a functional succinimide group at the other end, was newly synthesized and covalently coupled to amine-functionalized AFM tips. In particular, PEG 800 diamine was glutarylated, the mono-adduct NH 2 -PEG-COOH was isolated by ion exchange chromatography and reacted with biotin succinimidylester to give biotin-PEG-COOH which was then activated as N-hydroxysuccinimide (NHS) ester to give the biotin-PEG-NHS conjugate which was coupled to the aminofunctionalized AFM tip. The motional freedom provided by PEG allows for free rotation of the biotin molecule on the AFM sensor and for specific binding to avidin which had been adsorbed to mica surfaces via electrostatic interactions. Specific avidin-biotin recognition events were discriminated from nonspecific tip-mica adhesion by their typical unbinding force (∼40 pN at 1.4 nN/s loading rate), unbinding length (<13 nm), the characteristic nonlinear force-distance relation of the PEG linker, and by specific block with excess of free d-biotin. The convenience of the test system allowed to evaluate, and compare, different methods and conditions of tip aminofunctionalization with respect to specific binding and nonspecific adhesion. It is concluded that this system is well suited as calibration or start-up kit for single molecule recognition force microscopy

  1. Item level diagnostics and model - data fit in item response theory ...

    African Journals Online (AJOL)

    Item response theory (IRT) is a framework for modeling and analyzing item response data. Item-level modeling gives IRT advantages over classical test theory. The fit of an item score pattern to an item response theory (IRT) models is a necessary condition that must be assessed for further use of item and models that best fit ...

  2. Item-focussed Trees for the Identification of Items in Differential Item Functioning.

    Science.gov (United States)

    Tutz, Gerhard; Berger, Moritz

    2016-09-01

    A novel method for the identification of differential item functioning (DIF) by means of recursive partitioning techniques is proposed. We assume an extension of the Rasch model that allows for DIF being induced by an arbitrary number of covariates for each item. Recursive partitioning on the item level results in one tree for each item and leads to simultaneous selection of items and variables that induce DIF. For each item, it is possible to detect groups of subjects with different item difficulties, defined by combinations of characteristics that are not pre-specified. The way a DIF item is determined by covariates is visualized in a small tree and therefore easily accessible. An algorithm is proposed that is based on permutation tests. Various simulation studies, including the comparison with traditional approaches to identify items with DIF, show the applicability and the competitive performance of the method. Two applications illustrate the usefulness and the advantages of the new method.

  3. Test-retest reliability at the item level and total score level of the Norwegian version of the Spinal Cord Injury Falls Concern Scale (SCI-FCS).

    Science.gov (United States)

    Roaldsen, Kirsti Skavberg; Måøy, Åsa Blad; Jørgensen, Vivien; Stanghelle, Johan Kvalvik

    2016-05-01

    Translation of the Spinal Cord Injury Falls Concern Scale (SCI-FCS), and investigation of test-retest reliability on item-level and total-score-level. Translation, adaptation and test-retest study. A specialized rehabilitation setting in Norway. Fifty-four wheelchair users with a spinal cord injury. The median age of the cohort was 49 years, and the median number of years after injury was 13. Interventions/measurements: The SCI-FCS was translated and back-translated according to guidelines. Individuals answered the SCI-FCS twice over the course of one week. We investigated item-level test-retest reliability using Svensson's rank-based statistical method for disagreement analysis of paired ordinal data. For relative reliability, we analyzed the total-score-level test-retest reliability with intraclass correlation coefficients (ICC2.1), the standard error of measurement (SEM), and the smallest detectable change (SDC) for absolute reliability/measurement-error assessment and Cronbach's alpha for internal consistency. All items showed satisfactory percentage agreement (≥69%) between test and retest. There were small but non-negligible systematic disagreements among three items; we recovered an 11-13% higher chance for a lower second score. There was no disagreement due to random variance. The test-retest agreement (ICC2.1) was excellent (0.83). The SEM was 2.6 (12%), and the SDC was 7.1 (32%). The Cronbach's alpha was high (0.88). The Norwegian SCI-FCS is highly reliable for wheelchair users with chronic spinal cord injuries.

  4. Effects of memantine on cognition in patients with moderate to severe Alzheimer's disease: post-hoc analyses of ADAS-cog and SIB total and single-item scores from six randomized, double-blind, placebo-controlled studies.

    Science.gov (United States)

    Mecocci, Patrizia; Bladström, Anna; Stender, Karina

    2009-05-01

    The post-hoc analyses reported here evaluate the specific effects of memantine treatment on ADAS-cog single-items or SIB subscales for patients with moderate to severe AD. Data from six multicentre, randomised, placebo-controlled, parallel-group, double-blind, 6-month studies were used as the basis for these post-hoc analyses. All patients with a Mini-Mental State Examination (MMSE) score of less than 20 were included. Analyses of patients with moderate AD (MMSE: 10-19), evaluated with the Alzheimer's disease Assessment Scale (ADAS-cog) and analyses of patients with moderate to severe AD (MMSE: 3-14), evaluated using the Severe Impairment Battery (SIB), were performed separately. The mean change from baseline showed a significant benefit of memantine treatment on both the ADAS-cog (p ADAS-cog single-item analyses showed significant benefits of memantine treatment, compared to placebo, for mean change from baseline for commands (p < 0.001), ideational praxis (p < 0.05), orientation (p < 0.01), comprehension (p < 0.05), and remembering test instructions (p < 0.05) for observed cases (OC). The SIB subscale analyses showed significant benefits of memantine, compared to placebo, for mean change from baseline for language (p < 0.05), memory (p < 0.05), orientation (p < 0.01), praxis (p < 0.001), and visuospatial ability (p < 0.01) for OC. Memantine shows significant benefits on overall cognitive abilities as well as on specific key cognitive domains for patients with moderate to severe AD. (c) 2009 John Wiley & Sons, Ltd.

  5. Sources of interference in item and associative recognition memory.

    Science.gov (United States)

    Osth, Adam F; Dennis, Simon

    2015-04-01

    A powerful theoretical framework for exploring recognition memory is the global matching framework, in which a cue's memory strength reflects the similarity of the retrieval cues being matched against the contents of memory simultaneously. Contributions at retrieval can be categorized as matches and mismatches to the item and context cues, including the self match (match on item and context), item noise (match on context, mismatch on item), context noise (match on item, mismatch on context), and background noise (mismatch on item and context). We present a model that directly parameterizes the matches and mismatches to the item and context cues, which enables estimation of the magnitude of each interference contribution (item noise, context noise, and background noise). The model was fit within a hierarchical Bayesian framework to 10 recognition memory datasets that use manipulations of strength, list length, list strength, word frequency, study-test delay, and stimulus class in item and associative recognition. Estimates of the model parameters revealed at most a small contribution of item noise that varies by stimulus class, with virtually no item noise for single words and scenes. Despite the unpopularity of background noise in recognition memory models, background noise estimates dominated at retrieval across nearly all stimulus classes with the exception of high frequency words, which exhibited equivalent levels of context noise and background noise. These parameter estimates suggest that the majority of interference in recognition memory stems from experiences acquired before the learning episode. (c) 2015 APA, all rights reserved).

  6. Concurrent validity and sensitivity to change of Direct Behavior Rating Single-Item Scales (DBR-SIS) within an elementary sample.

    Science.gov (United States)

    Smith, Rhonda L; Eklund, Katie; Kilgus, Stephen P

    2018-03-01

    The purpose of this study was to evaluate the concurrent validity, sensitivity to change, and teacher acceptability of Direct Behavior Rating single-item scales (DBR-SIS), a brief progress monitoring measure designed to assess student behavioral change in response to intervention. Twenty-four elementary teacher-student dyads implemented a daily report card intervention to promote positive student behavior during prespecified classroom activities. During both baseline and intervention, teachers completed DBR-SIS ratings of 2 target behaviors (i.e., Academic Engagement, Disruptive Behavior) whereas research assistants collected systematic direct observation (SDO) data in relation to the same behaviors. Five change metrics (i.e., absolute change, percent of change from baseline, improvement rate difference, Tau-U, and standardized mean difference; Gresham, 2005) were calculated for both DBR-SIS and SDO data, yielding estimates of the change in student behavior in response to intervention. Mean DBR-SIS scores were predominantly moderately to highly correlated with SDO data within both baseline and intervention, demonstrating evidence of the former's concurrent validity. DBR-SIS change metrics were also significantly correlated with SDO change metrics for both Disruptive Behavior and Academic Engagement, yielding evidence of the former's sensitivity to change. In addition, teacher Usage Rating Profile-Assessment (URP-A) ratings indicated they found DBR-SIS to be acceptable and usable. Implications for practice, study limitations, and areas of future research are discussed. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  7. Is a single item stress measure independently associated with subsequent severe injury: a prospective cohort study of 16,385 forest industry employees.

    Science.gov (United States)

    Salminen, Simo; Kouvonen, Anne; Koskinen, Aki; Joensuu, Matti; Väänänen, Ari

    2014-06-02

    A previous review showed that high stress increases the risk of occupational injury by three- to five-fold. However, most of the prior studies have relied on short follow-ups. In this prospective cohort study we examined the effect of stress on recorded hospitalised injuries in an 8-year follow-up. A total of 16,385 employees of a Finnish forest company responded to the questionnaire. Perceived stress was measured with a validated single-item measure, and analysed in relation recorded hospitalised injuries from 1986 to 2008. We used Cox proportional hazard regression models to examine the prospective associations between work stress, injuries and confounding factors. Highly stressed participants were approximately 40% more likely to be hospitalised due to injury over the follow-up period than participants with low stress. This association remained significant after adjustment for age, gender, marital status, occupational status, educational level, and physical work environment. High stress is associated with an increased risk of severe injury.

  8. Single Stage Contactor Testing Of The Next Generation Solvent Blend

    Energy Technology Data Exchange (ETDEWEB)

    Herman, D. T.; Peters, T. B.; Duignan, M. R.; Williams, M. R.; Poirier, M. R.; Brass, E. A.; Garrison, A. G.; Ketusky, E. T.

    2014-01-06

    The Modular Caustic Side Solvent Extraction (CSSX) Unit (MCU) facility at the Savannah River Site (SRS) is actively pursuing the transition from the current BOBCalixC6 based solvent to the Next Generation Solvent (NGS)-MCU solvent to increase the cesium decontamination factor. To support this integration of NGS into the MCU facility the Savannah River National Laboratory (SRNL) performed testing of a blend of the NGS (MaxCalix based solvent) with the current solvent (BOBCalixC6 based solvent) for the removal of cesium (Cs) from the liquid salt waste stream. This testing utilized a blend of BOBCalixC6 based solvent and the NGS with the new extractant, MaxCalix, as well as a new suppressor, tris(3,7dimethyloctyl) guanidine. Single stage tests were conducted using the full size V-05 and V-10 liquid-to-liquid centrifugal contactors installed at SRNL. These tests were designed to determine the mass transfer and hydraulic characteristics with the NGS solvent blended with the projected heel of the BOBCalixC6 based solvent that will exist in MCU at time of transition. The test program evaluated the amount of organic carryover and the droplet size of the organic carryover phases using several analytical methods. The results indicate that hydraulically, the NGS solvent performed hydraulically similar to the current solvent which was expected. For the organic carryover 93% of the solvent is predicted to be recovered from the stripping operation and 96% from the extraction operation. As for the mass transfer, the NGS solvent significantly improved the cesium DF by at least an order of magnitude when extrapolating the One-stage results to actual Seven-stage extraction operation with a stage efficiency of 95%.

  9. Testing the single degenerate channel for supernova Ia

    Science.gov (United States)

    Parsons, Steven

    2014-10-01

    The progenitors of supernova Ia are close binaries containing white dwarfs. Of crucial importance to the evolution of these systems is how much material the white dwarf can stably accrete and hence grow in mass. This occurs during a short-lived intense phase of mass transfer known as the super soft source (SSS) phase. The short duration of this phase and large extinction to soft X-rays means that only a handful are known in our Galaxy. Far more can be learned from the underlying SSS progenitor population of close white dwarf plus FGK type binaries. Unfortunately, these systems are hard to find since the main-sequence stars completely outshine the white dwarfs at optical wavelengths. Because of this, there are currently no known close white dwarf binaries with F, G or early K type companions, making it impossible to determine the contribution of the single degenerate channel towards supernova Ia. Using the GALEX and RAVE surveys we have now identified the first large sample of FGK stars with UV excesses, a fraction of which are these illusive, close systems. Following an intense ground based spectroscopic investigation of these systems, we have identified 5 definite close binaries, with periods of less than a few days. Here we apply for COS spectroscopic observations to measure the mass and temperature of the white dwarfs in order to determine the future evolution of these systems. This will provide a crucial test for the single degenerate channel towards supernova Ia.

  10. Laboratory testing on infiltration in single synthetic fractures

    Science.gov (United States)

    Cherubini, Claudia; Pastore, Nicola; Li, Jiawei; Giasi, Concetta I.; Li, Ling

    2017-04-01

    An understanding of infiltration phenomena in unsaturated rock fractures is extremely important in many branches of engineering for numerous reasons. Sectors such as the oil, gas and water industries are regularly interacting with water seepage through rock fractures, yet the understanding of the mechanics and behaviour associated with this sort of flow is still incomplete. An apparatus has been set up to test infiltration in single synthetic fractures in both dry and wet conditions. To simulate the two fracture planes, concrete fractures have been moulded from 3D printed fractures with varying geometrical configurations, in order to analyse the influence of aperture and roughness on infiltration. Water flows through the single fractures by means of a hydraulic system composed by an upstream and a downstream reservoir, the latter being subdivided into five equal sections in order to measure the flow rate in each part to detect zones of preferential flow. The fractures have been set at various angles of inclination to investigate the effect of this parameter on infiltration dynamics. The results obtained identified that altering certain fracture parameters and conditions produces relevant effects on the infiltration process through the fractures. The main variables influencing the formation of preferential flow are: the inclination angle of the fracture, the saturation level of the fracture and the mismatch wavelength of the fracture.

  11. Crystal plasticity study of single crystal tungsten by indentation tests

    International Nuclear Information System (INIS)

    Yao, Weizhi

    2012-01-01

    Owing to its favorable material properties, tungsten (W) has been studied as a plasma-facing material in fusion reactors. Experiments on W heating in plasma sources and electron beam facilities have shown an intense micro-crack formation at the heated surface and sub-surface. The cracks go deep inside the irradiated sample, and often large distorted areas caused by local plastic deformation are present around the cracks. To interpret the crack-induced microscopic damage evolution process in W, one needs firstly to understand its plasticity on a single grain level, which is referred to as crystal plasticity. In this thesis, the crystal plasticity of single crystal tungsten (SCW) has been studied by spherical and Berkovich indentation tests and the finite element method with a crystal plasticity model. Appropriate values of the material parameters included in the crystal plasticity model are determined by fitting measured load-displacement curves and pile-up profiles with simulated counterparts for spherical indentation. The numerical simulations reveal excellent agreement with experiment. While the load-displacement curves and the deduced indentation hardness exhibit little sensitivity to the indented plane at small indentation depths, the orientation of slip directions within the crystals governs the development of deformation hillocks at the surface. It is found that several factors like friction, indentation depth, active slip systems, misoriented crystal orientation, misoriented sample surface and azimuthal orientation of the indenter can affect the indentation behavior of SCW. The Berkovich indentation test was also used to study the crystal plasticity of SCW after deuterium irradiation. The critical load (pop-in load) for triggering plastic deformation under the indenter is found to depend on the crystallographic orientation. The pop-in loads decrease dramatically after deuterium plasma irradiation for all three investigated crystallographic planes.

  12. Radiation Tests of Single Photon Avalanche Diode for Space Applications

    Science.gov (United States)

    Moscatelli, Francesco; Marisaldi, Martino; MacCagnani, Piera; Labanti, Claudio; Fuschino, Fabio; Prest, Michela; Berra, Alessandro; Bolognini, Davide; Ghioni, Massimo; Rech, Ivan; hide

    2013-01-01

    Single photon avalanche diodes (SPADs) have been recently studied as photodetectors for applications in space missions. In this presentation we report the results of radiation hardness test on large area SPAD (actual results refer to SPADs having 500 micron diameter). Dark counts rate as low as few kHz at -10 degC has been obtained for the 500 micron devices, before irradiation. We performed bulk damage and total dose radiation tests with protons and gamma-rays in order to evaluate their radiation hardness properties and their suitability for application in a Low Earth Orbit (LEO) space mission. With this aim SPAD devices have been irradiated using up to 20 krad total dose with gamma-rays and 5 krad with protons. The test performed show that large area SPADs are very sensitive to proton doses as low as 2×10(exp 8) (1 MeV eq) n/cm2 with a significant increase in dark counts rate (DCR) as well as in the manifestation of the "random telegraph signal" effect. Annealing studies at room temperature (RT) and at 80 degC have been carried out, showing a high decrease of DCR after 24-48 h at RT. Lower protons doses in the range 1-10×10(exp 7) (1 MeV eq) n/cm(exp 2) result in a lower increase of DCR suggesting that the large-area SPADs tested in this study are well suitable for application in low-inclination LEO, particularly useful for gamma-ray astrophysics.

  13. The single-leg-stance test in Parkinson's disease.

    Science.gov (United States)

    Chomiak, Taylor; Pereira, Fernando Vieira; Hu, Bin

    2015-03-01

    Timed single-leg-stance test (SLST) is widely used to assess postural control in the elderly. In Parkinson's disease (PD), it has been shown that an SLST around 10 seconds or below may be a sensitive indicator of future falls. However, despite its role in fall risk, whether SLST times around 10 seconds marks a clinically important stage of disease progression has largely remained unexplored. A cross-sectional study where 27 people with PD were recruited and instructed to undertake timed SLST for both legs was conducted. Disease motor impairment was assessed with the Unified Parkinson's Disease Rating Scale Part 3 (UPDRS-III). This study found that: 1) the SLST in people with PD shows good test-retest reliability; 2) SLST values can be attributed to two non-overlapping clusters: a low (10.4 ± 6.3 seconds) and a high (47.6 ± 11.7 seconds) value SLST group; 3) only the low value SLST group can be considered abnormal when age-matched normative SLST data are taken into account for comparison; and 4) lower UPDRS-III motor performance, and the bradykinesia sub-score in particular, are only associated with the low SLST group. These results lend further support that a low SLST time around 10 seconds marks a clinically important stage of disease progression with significant worsening of postural stability in PD.

  14. Precision tests of CPT invariance with single trapped antiprotons

    Energy Technology Data Exchange (ETDEWEB)

    Ulmer, Stefan [RIKEN, Ulmer Initiative Research Unit, Wako, Saitama (Japan); Collaboration: BASE-Collaboration

    2015-07-01

    The reason for the striking imbalance of matter and antimatter in our Universe has yet to be understood. This is the motivation and inspiration to conduct high precision experiments comparing the fundamental properties of matter and antimatter equivalents at lowest energies and with greatest precision. According to theory, the most sensitive tests of CPT invariance are measurements of antihydrogen ground-state hyperfine splitting as well as comparisons of proton and antiproton magnetic moments. Within the BASE collaboration we target the latter. By using a double Penning trap we performed very recently the first direct high precision measurement of the proton magnetic moment. The achieved fractional precision of 3.3 ppb improves the currently accepted literature value by a factor of 2.5. Application of the method to a single trapped antiproton will improve precision of the particles magnetic moment by more than a factor of 1000, thus providing one of the most stringent tests of CPT invariance. In my talk I report on the status and future perspectives of our efforts.

  15. A New Extension of the Binomial Error Model for Responses to Items of Varying Difficulty in Educational Testing and Attitude Surveys.

    Directory of Open Access Journals (Sweden)

    James A Wiley

    Full Text Available We put forward a new item response model which is an extension of the binomial error model first introduced by Keats and Lord. Like the binomial error model, the basic latent variable can be interpreted as a probability of responding in a certain way to an arbitrarily specified item. For a set of dichotomous items, this model gives predictions that are similar to other single parameter IRT models (such as the Rasch model but has certain advantages in more complex cases. The first is that in specifying a flexible two-parameter Beta distribution for the latent variable, it is easy to formulate models for randomized experiments in which there is no reason to believe that either the latent variable or its distribution vary over randomly composed experimental groups. Second, the elementary response function is such that extensions to more complex cases (e.g., polychotomous responses, unfolding scales are straightforward. Third, the probability metric of the latent trait allows tractable extensions to cover a wide variety of stochastic response processes.

  16. Effects of Misbehaving Common Items on Aggregate Scores and an Application of the Mantel-Haenszel Statistic in Test Equating. CSE Report 688

    Science.gov (United States)

    Michaelides, Michalis P.

    2006-01-01

    Consistent behavior is a desirable characteristic that common items are expected to have when administered to different groups. Findings from the literature have established that items do not always behave in consistent ways; item indices and IRT item parameter estimates of the same items differ when obtained from different administrations.…

  17. Software Note: Using BILOG for Fixed-Anchor Item Calibration

    Science.gov (United States)

    DeMars, Christine E.; Jurich, Daniel P.

    2012-01-01

    The nonequivalent groups anchor test (NEAT) design is often used to scale item parameters from two different test forms. A subset of items, called the anchor items or common items, are administered as part of both test forms. These items are used to adjust the item calibrations for any differences in the ability distributions of the groups taking…

  18. The optimal sequence and selection of screening test items to predict fall risk in older disabled women: the Women's Health and Aging Study.

    Science.gov (United States)

    Lamb, Sarah E; McCabe, Chris; Becker, Clemens; Fried, Linda P; Guralnik, Jack M

    2008-10-01

    Falls are a major cause of disability, dependence, and death in older people. Brief screening algorithms may be helpful in identifying risk and leading to more detailed assessment. Our aim was to determine the most effective sequence of falls screening test items from a wide selection of recommended items including self-report and performance tests, and to compare performance with other published guidelines. Data were from a prospective, age-stratified, cohort study. Participants were 1002 community-dwelling women aged 65 years old or older, experiencing at least some mild disability. Assessments of fall risk factors were conducted in participants' homes. Fall outcomes were collected at 6 monthly intervals. Algorithms were built for prediction of any fall over a 12-month period using tree classification with cross-set validation. Algorithms using performance tests provided the best prediction of fall events, and achieved moderate to strong performance when compared to commonly accepted benchmarks. The items selected by the best performing algorithm were the number of falls in the last year and, in selected subpopulations, frequency of difficulty balancing while walking, a 4 m walking speed test, body mass index, and a test of knee extensor strength. The algorithm performed better than that from the American Geriatric Society/British Geriatric Society/American Academy of Orthopaedic Surgeons and other guidance, although these findings should be treated with caution. Suggestions are made on the type, number, and sequence of tests that could be used to maximize estimation of the probability of falling in older disabled women.

  19. A Model of Batch Scheduling for a Single Batch Processor with Additional Setups to Minimize Total Inventory Holding Cost of Parts of a Single Item Requested at Multi-due-date

    Science.gov (United States)

    Hakim Halim, Abdul; Ernawati; Hidayat, Nita P. A.

    2018-03-01

    This paper deals with a model of batch scheduling for a single batch processor on which a number of parts of a single items are to be processed. The process needs two kinds of setups, i. e., main setups required before processing any batches, and additional setups required repeatedly after the batch processor completes a certain number of batches. The parts to be processed arrive at the shop floor at the times coinciding with their respective starting times of processing, and the completed parts are to be delivered at multiple due dates. The objective adopted for the model is that of minimizing total inventory holding cost consisting of holding cost per unit time for a part in completed batches, and that in in-process batches. The formulation of total inventory holding cost is derived from the so-called actual flow time defined as the interval between arrival times of parts at the production line and delivery times of the completed parts. The actual flow time satisfies not only minimum inventory but also arrival and delivery just in times. An algorithm to solve the model is proposed and a numerical example is shown.

  20. The construct equivalence and item bias of the pib/SpEEx conceptualisation-ability test for members of five language groups in South Africa

    Directory of Open Access Journals (Sweden)

    Pieter Schaap

    2008-11-01

    Full Text Available This study’s objective was to determine whether the Potential Index Batteries/Situation Specific Evaluation Expert (PIB/SpEEx conceptualisation (100 ability test displays construct equivalence and item bias for members of five selected language groups in South Africa. The sample consisted of a non-probability convenience sample (N = 6 261 of members of five language groups (speakers of Afrikaans, English, North Sotho, Setswana and isiZulu working in the medical and beverage industries or studying at higher-educational institutions. Exploratory factor analysis with target rotations confrmed the PIB/SpEEx 100’s construct equivalence for the respondents from these five language groups. No evidence of either uniform or non-uniform item bias of practical signifcance was found for the sample.

  1. CPM Test-Retest Reliability: "Standard" vs "Single Test-Stimulus" Protocols.

    Science.gov (United States)

    Granovsky, Yelena; Miller-Barmak, Adi; Goldstein, Oren; Sprecher, Elliot; Yarnitsky, David

    2016-03-01

    Assessment of pain inhibitory mechanisms using conditioned pain modulation (CPM) is relevant clinically in prediction of pain and analgesic efficacy. Our objective is to provide necessary estimates of intersession CPM reliability, to enable transformation of the CPM paradigm into a clinical tool. Two cohorts of young healthy subjects (N = 65) participated in two dual-session studies. In Study I, a Bath-Thermode CPM protocol was used, with hot water immersion and contact heat as conditioning- and test-stimuli, respectively, in a classical parallel CPM design introducing test-stimulus first, and then the conditioning- and repeated test-stimuli in parallel. Study II consisted of two CPM protocols: 1) Two-Thermodes, one for each of the stimuli, in the same parallel design as above, and 2) single test-stimulus (STS) protocol with a single administration of a contact heat test-stimulus, partially overlapped in time by a remote shorter contact heat as conditioning stimulus. Test-retest reliability was assessed within 3-7 days. The STS-CPM had superior reliability intraclass correlation (ICC 2 ,: 1  = 0.59) over Bath-Thermode (ICC 2 ,: 1  = 0.34) or Two-Thermodes (ICC 2 ,: 1  = 0.21) protocols. The hand immersion conditioning pain had higher reliability than thermode pain (ICC 2 ,: 1  = 0.76 vs ICC 2 ,: 1  = 0.16). Conditioned test-stimulus pain scores were of good (ICC 2 ,: 1  = 0.62) or fair (ICC 2 ,: 1  = 0.43) reliability for the Bath-Thermode and the STS, respectively, but not for the Two-Thermodes protocol (ICC 2 ,: 1  = 0.20). The newly developed STS-CPM paradigm was more reliable than other CPM protocols tested here, and should be further investigated for its clinical relevance. It appears that large contact size of the conditioning-stimulus and use of single rather than dual test-stimulus pain contribute to augmentation of CPM reliability. © 2015 American Academy of Pain Medicine. All rights reserved. For permissions, please e

  2. Item analysis and evaluation in the examinations in the faculty of ...

    African Journals Online (AJOL)

    2014-11-05

    Nov 5, 2014 ... Key words: Classical test theory, item analysis, item difficulty, item discrimination, item response theory, reliability ... the probability of answering an item correctly or of attaining ..... A Monte Carlo comparison of item and person.

  3. Developing Pairwise Preference-Based Personality Test and Experimental Investigation of Its Resistance to Faking Effect by Item Response Model

    Science.gov (United States)

    Usami, Satoshi; Sakamoto, Asami; Naito, Jun; Abe, Yu

    2016-01-01

    Recent years have shown increased awareness of the importance of personality tests in educational, clinical, and occupational settings, and developing faking-resistant personality tests is a very pragmatic issue for achieving more precise measurement. Inspired by Stark (2002) and Stark, Chernyshenko, and Drasgow (2005), we develop a pairwise…

  4. MCQ testing in higher education: Yes, there are bad items and invalid scores—A case study identifying solutions

    OpenAIRE

    Brown, Gavin

    2017-01-01

    This is a lecture given at Umea University, Sweden in September 2017. It is based on the published study: Brown, G. T. L., & Abdulnabi, H. (2017). Evaluating the quality of higher education instructor-constructed multiple-choice tests: Impact on student grades. Frontiers in Education: Assessment, Testing, & Applied Measurement, 2(24).. doi:10.3389/feduc.2017.00024

  5. The Impact Analysis of Psychological Reliability of Population Pilot Study For Selection of Particular Reliable Multi-Choice Item Test in Foreign Language Research Work

    Directory of Open Access Journals (Sweden)

    Seyed Hossein Fazeli

    2010-10-01

    Full Text Available The purpose of research described in the current study is the psychological reliability, its’ importance, application, and more to investigate on the impact analysis of psychological reliability of population pilot study for selection of particular reliable multi-choice item test in foreign language research work. The population for subject recruitment was all under graduated students from second semester at large university in Iran (both male and female that study English as a compulsory paper. In Iran, English is taught as a foreign language.

  6. Spare Items validation

    International Nuclear Information System (INIS)

    Fernandez Carratala, L.

    1998-01-01

    There is an increasing difficulty for purchasing safety related spare items, with certifications by manufacturers for maintaining the original qualifications of the equipment of destination. The main reasons are, on the top of the logical evolution of technology, applied to the new manufactured components, the quitting of nuclear specific production lines and the evolution of manufacturers quality systems, originally based on nuclear codes and standards, to conventional industry standards. To face this problem, for many years different Dedication processes have been implemented to verify whether a commercial grade element is acceptable to be used in safety related applications. In the same way, due to our particular position regarding the spare part supplies, mainly from markets others than the american, C.N. Trillo has developed a methodology called Spare Items Validation. This methodology, which is originally based on dedication processes, is not a single process but a group of coordinated processes involving engineering, quality and management activities. These are to be performed on the spare item itself, its design control, its fabrication and its supply for allowing its use in destinations with specific requirements. The scope of application is not only focussed on safety related items, but also to complex design, high cost or plant reliability related components. The implementation in C.N. Trillo has been mainly curried out by merging, modifying and making the most of processes and activities which were already being performed in the company. (Author)

  7. Defining surgical criteria for empty nose syndrome: Validation of the office-based cotton test and clinical interpretability of the validated Empty Nose Syndrome 6-Item Questionnaire.

    Science.gov (United States)

    Thamboo, Andrew; Velasquez, Nathalia; Habib, Al-Rahim R; Zarabanda, David; Paknezhad, Hassan; Nayak, Jayakar V

    2017-08-01

    The validated Empty Nose Syndrome 6-Item Questionnaire (ENS6Q) identifies empty nose syndrome (ENS) patients. The unvalidated cotton test assesses improvement in ENS-related symptoms. By first validating the cotton test using the ENS6Q, we define the minimal clinically important difference (MCID) score for the ENS6Q. Individual case-control study. Fifteen patients diagnosed with ENS and 18 controls with non-ENS sinonasal conditions underwent office cotton placement. Both groups completed ENS6Q testing in three conditions-precotton, cotton in situ, and postcotton-to measure the reproducibility of ENS6Q scoring. Participants also completed a five-item transition scale ranging from "much better" to "much worse" to rate subjective changes in nasal breathing with and without cotton placement. Mean changes for each transition point, and the ENS6Q MCID, were then calculated. In the precotton condition, significant differences (P < .001) in all ENS6Q questions between ENS and controls were noted. With cotton in situ, nearly all prior ENS6Q differences normalized between ENS and control patients. For ENS patients, the changes in the mean differences between the precotton and cotton in situ conditions compared to postcotton versus cotton in situ conditions were insignificant among individuals. Including all 33 participants, the mean change in the ENS6Q between the parameters "a little better" and "about the same" was 4.25 (standard deviation [SD] = 5.79) and -2.00 (SD = 3.70), giving an MCID of 6.25. Cotton testing is a validated office test to assess for ENS patients. Cotton testing also helped to determine the MCID of the ENS6Q, which is a 7-point change from the baseline ENS6Q score. 3b. Laryngoscope, 127:1746-1752, 2017. © 2017 The American Laryngological, Rhinological and Otological Society, Inc.

  8. Empirical versus Random Item Selection in the Design of Intelligence Test Short Forms--The WISC-R Example.

    Science.gov (United States)

    Goh, David S.

    1979-01-01

    The advantages of using psychometric thoery to design short forms of intelligence tests are demonstrated by comparing such usage to a systematic random procedure that has previously been used. The Wechsler Intelligence Scale for Children Revised (WISC-R) Short Form is presented as an example. (JKS)

  9. Test-retest reliability of Antonovsky's 13-item sense of coherence scale in patients with hand-related disorders

    DEFF Research Database (Denmark)

    Hansen, Alice Ørts; Kristensen, Hanne Kaae; Cederlund, Ragnhild

    2017-01-01

    to be a powerful tool to measure the ICF component personal factors, which could have an impact on patients' rehabilitation outcomes. Implications for rehabilitation Antonovsky's SOC-13 scale showed test-retest reliability for patients with hand-related disorders. The SOC-13 scale could be a suitable tool to help...... measure personal factors....

  10. Cross-cultural development of an item list for computer-adaptive testing of fatigue in oncological patients

    DEFF Research Database (Denmark)

    Giesinger, Johannes M.; Petersen, Morten Aa.; Grønvold, Mogens

    2011-01-01

    Within an ongoing project of the EORTC Quality of Life Group, we are developing computerized adaptive test (CAT) measures for the QLQ-C30 scales. These new CAT measures are conceptualised to reflect the same constructs as the QLQ-C30 scales. Accordingly, the Fatigue-CAT is intended to capture phy...... physical and general fatigue....

  11. On-Demand Testing and Maintaining Standards for General Qualifications in the UK Using Item Response Theory: Possibilities and Challenges

    Science.gov (United States)

    He, Qingping

    2012-01-01

    Background: Although on-demand testing is being increasingly used in many areas of assessment, it has not been adopted in high stakes examinations like the General Certificate of Secondary Education (GCSE) and General Certificate of Education Advanced level (GCE A level) offered by awarding organisations (AOs) in the UK. One of the major issues…

  12. Using Standards and Empirical Evidence to Develop Academic English Proficiency Test Items in Reading. CSE Technical Report 664

    Science.gov (United States)

    Bailey, Alison L.; Stevens, Robin; Butler, Frances A.; Huang, Becky; Miyoshi, Judy N.

    2005-01-01

    The work we report focuses on utilizing linguistic profiles of mathematics, science and social studies textbook selections for the creation of reading test specifications. Once we determined that a text and associated tasks fit within the parameters established in Butler et al. (2004), they underwent both internal and external review by language…

  13. A Generalized Logistic Regression Procedure to Detect Differential Item Functioning among Multiple Groups

    Science.gov (United States)

    Magis, David; Raiche, Gilles; Beland, Sebastien; Gerard, Paul

    2011-01-01

    We present an extension of the logistic regression procedure to identify dichotomous differential item functioning (DIF) in the presence of more than two groups of respondents. Starting from the usual framework of a single focal group, we propose a general approach to estimate the item response functions in each group and to test for the presence…

  14. A REVIEW OF SINGLE SPECIES TOXICITY TESTS: ARE THE TESTS RELIABLE PREDICTORS OF AQUATIC ECOSYSTEM COMMUNITY RESPONSES?

    Science.gov (United States)

    This document provides a comprehensive review to evaluate the reliability of indicator species toxicity test results in predicting aquatic ecosystem impacts, also called the ecological relevance of laboratory single species toxicity tests.

  15. Evaluation of the Relative Validity and Test-Retest Reliability of a 15-Item Beverage Intake Questionnaire in Children and Adolescents.

    Science.gov (United States)

    Hill, Catelyn E; MacDougall, Carly R; Riebl, Shaun K; Savla, Jyoti; Hedrick, Valisa E; Davy, Brenda M

    2017-11-01

    Added sugar intake, in the form of sugar-sweetened beverages (SSBs), may contribute to weight gain and obesity development in children and adolescents. A valid and reliable brief beverage intake assessment tool for children and adolescents could facilitate research in this area. The purpose of this investigation was to evaluate the relative validity and test-retest reliability of a 15-item beverage intake questionnaire (BEVQ) for assessing usual beverage intake in children and adolescents. This cross-sectional investigation included four study visits within a 2- to 3-week time period. Participants (333 enrolled; 98% completion rate) were children aged 6 to 11 years and adolescents aged 12 to18 years recruited from the New River Valley, VA, region from January 2014 to September 2015. Study visits included assessment of height/weight, health history, and four 24-hour dietary recalls (24HRs). The BEVQ was completed at two visits (BEVQ 1, BEVQ 2). To evaluate relative validity, BEVQ 1 was compared with habitual beverage intake determined by the averaged 24HR. To evaluate test-retest reliability, BEVQ 1 was compared with BEVQ 2. Analyses included descriptive statistics, independent sample t tests, χ 2 tests, one-way analysis of variance, paired sample t tests, and correlational analyses. In the full sample, self-reported water and total SSB intake were not different between BEVQ 1 and 24HR (mean differences 0±1 fl oz and 0±1 fl oz, respectively; both P values >0.05). Reported intake across all beverage categories was significantly correlated between BEVQ 1 and BEVQ 2 (Pbeverages was not different (all P values >0.05) between BEVQ 1 and 24HR (mean differences: whole milk=3±4 kcal, reduced-fat milk=9±5 kcal, and fat-free milk=7±6 kcal, which is 7±15 total beverage kilocalories). In adolescents (n=200), water and SSB kilocalories were not different (both P values >0.05) between BEVQ 1 and 24HR (mean differences: -1±1 fl oz and 12±9 kcal, respectively). A 15

  16. Using Differential Item Functioning Procedures to Explore Sources of Item Difficulty and Group Performance Characteristics.

    Science.gov (United States)

    Scheuneman, Janice Dowd; Gerritz, Kalle

    1990-01-01

    Differential item functioning (DIF) methodology for revealing sources of item difficulty and performance characteristics of different groups was explored. A total of 150 Scholastic Aptitude Test items and 132 Graduate Record Examination general test items were analyzed. DIF was evaluated for males and females and Blacks and Whites. (SLD)

  17. Pre test parametric studies on single compartment vented enclosure

    International Nuclear Information System (INIS)

    Sharma, Pavan K.; Gera, B.; Singh, R.K.; Vaze, K.K.

    2011-01-01

    Establishing a proper design fire scenario is a challenging task and essential component for conducting fire safety design of buildings. A design fire scenario is a qualitative description of a fire with time identifying key events that characterize the fire (ignition, growth, flashover, fully-developed, and decay stages of fire). Proper fire safety design requires the appropriate selection of design fires against which the performance of the building is evaluated. The selection of the design fires directly impacts all aspects of fire safety performance, including the structural fire resistance, compartmentation against fire spread, egress systems, manual or automatic detection systems, suppression systems, and smoke control. The parameters affecting design fires include, the type, amount and arrangement of combustible materials, the ventilation conditions (air supply conditions, door/window open), and size of the compartment of fire origin. A design fire is a quantitative description of the characteristics of a fire, such as heat release rate (HRR), size of fire and its rate of spread, yield of products of combustion, and hot gas temperatures. Design fires are based on fire scenarios that replicate real fires. Six Computational Fluid Dynamics (CFD) numerical simulations were conducted in order to investigate the effect of fire load on fire dynamics in a) iso corner fire configuration b) IIT Delhi single compartment of a size of 5.0 m long, 5.0 m wide and 5.0 m high with doorway opening of 1m x 3m with centre fire of size 0.5 m x 0.5m. These types of simulation are carried out for deciding about the instrumentation scheme, safety aspect, and optimization of proposed experiments for National Fire Test Facility as pretest calculations. The simulations results are summarized in various identified applied parameter which are useful in terms of understanding the complex fire dynamics, validating the numerical tolls against experiments and using them (in form of values

  18. Testing competing hypotheses about single trial fMRI

    DEFF Research Database (Denmark)

    Hansen, Lars Kai; Purushotham, Archana; Kim, Seong-Ge

    2002-01-01

    We use a Bayesian framework to compute probabilities of competing hypotheses about functional activation based on single trial fMRI measurements. Within the framework we obtain a complete probabilistic picture of competing hypotheses, hence control of both type I and type II errors....

  19. Single event effect testing of the Intel 80386 family and the 80486 microprocessor

    International Nuclear Information System (INIS)

    Moran, A.; LaBel, K.; Gates, M.; Seidleck, C.; McGraw, R.; Broida, M.; Firer, J.; Sprehn, S.

    1996-01-01

    The authors present single event effect test results for the Intel 80386 microprocessor, the 80387 coprocessor, the 82380 peripheral device, and on the 80486 microprocessor. Both single event upset and latchup conditions were monitored

  20. Análise de itens de uma prova de raciocínio estatístico Analysis of items of a statistical reasoning test

    Directory of Open Access Journals (Sweden)

    Claudette Maria Medeiros Vendramini

    2004-12-01

    Full Text Available Este estudo objetivou analisar as 18 questões (do tipo múltipla escolha de uma prova sobre conceitos básicos de Estatística pelas teorias clássica e moderna. Participaram 325 universitários, selecionados aleatoriamente das áreas de humanas, exatas e saúde. A análise indicou que a prova é predominantemente unidimensional e que os itens podem ser mais bem ajustados ao modelo de três parâmetros. Os índices de dificuldade, discriminação e correlação bisserial apresentam valores aceitáveis. Sugere-se a inclusão de novos itens na prova, que busquem confiabilidade e validade para o contexto educacional e revelem o raciocínio estatístico de universitários ao ler representações de dados estatísticos.This study aimed at to analyze the 18 questions (of multiple choice type of a test on basic concepts of Statistics for the classic and modern theories. The test was taken by 325 undergraduate students, randomly selected from the areas of Human, Exact and Health Sciences. The analysis indicated that the test has predominantly one dimension and that the items can be better fitting to the model of three parameters. The indexes of difficulty, discrimination and biserial correlation present acceptable values. It is suggested to include new items to the test in order to obtain reliability and validity to use it in the education context and to reveal the statistical reasoning of undergraduate students when dealing with statistical data representation.

  1. Numerical test for single concrete armour layer on breakwaters

    OpenAIRE

    Anastasaki, E; Latham, J-P; Xiang, J

    2016-01-01

    The ability of concrete armour units for breakwaters to interlock and form an integral single layer is important for withstanding severe wave conditions. In reality, displacements take place under wave loading, whether they are small and insignificant or large and representing serious structural damage. In this work, a code that combines finite- and discrete-element methods which can simulate motion and interaction among units was used to conduct a numerical investigation. Various concrete ar...

  2. Assessment of free and cued recall in Alzheimer's disease and vascular and frontotemporal dementia with 24-item Grober and Buschke test.

    Science.gov (United States)

    Cerciello, Milena; Isella, Valeria; Proserpi, Alice; Papagno, Costanza

    2017-01-01

    Alzheimer's disease (AD), vascular dementia (VaD) and frontotemporal dementia (FTD) are the most common forms of dementia. It is well known that memory deficits in AD are different from those in VaD and FTD, especially with respect to cued recall. The aim of this clinical study was to compare the memory performance in 15 AD, 10 VaD and 9 FTD patients and 20 normal controls by means of a 24-item Grober-Buschke test [8]. The patients' groups were comparable in terms of severity of dementia. We considered free and total recall (free plus cued) both in immediate and delayed recall and computed an Index of Sensitivity to Cueing (ISC) [8] for immediate and delayed trials. We assessed whether cued recall predicted the subsequent free recall across our patients' groups. We found that AD patients recalled fewer items from the beginning and were less sensitive to cueing supporting the hypothesis that memory disorders in AD depend on encoding and storage deficit. In immediate recall VaD and FTD showed a similar memory performance and a stronger sensitivity to cueing than AD, suggesting that memory disorders in these patients are due to a difficulty in spontaneously implementing efficient retrieval strategies. However, we found a lower ISC in the delayed recall compared to the immediate trials in VaD than FTD due to a higher forgetting in VaD.

  3. Compreensão da leitura: análise do funcionamento diferencial dos itens de um Teste de Cloze Reading comprehension: differential item functioning analysis of a Cloze Test

    Directory of Open Access Journals (Sweden)

    Katya Luciane Oliveira

    2012-01-01

    Full Text Available Este estudo teve por objetivos investigar o ajuste de um Teste de Cloze ao modelo Rasch e avaliar a dificuldade na resposta ao item em razão do gênero das pessoas (DIF. Participaram da pesquisa 573 alunos das 5ª a 8ª séries do ensino fundamental de escolas públicas estaduais dos estados de São Paulo e Minas Gerais. O teste de Cloze foi aplicado de forma coletiva. A análise do instrumento evidenciou um bom ajuste ao modelo Rasch, bem como os itens foram respondidos conforme o padrão esperado, demonstrando um bom ajuste, também. Quanto ao DIF, apenas três itens indicaram diferenciar o gênero. Com base nos dados, identificou-se que houve equilíbrio nas respostas dadas pelos meninos e meninas.The objectives of the present study were to investigate the adaptation of a Cloze test to the Rasch Model as well as to evaluate the Differential Item Functioning (DIF in relation to gender. The sample was composed by 573 students from 5th to 8th grades of public schools in the state of São Paulo. The cloze test was applied collectively. The analysis of the instrument revealed its adaptation to Rash Model and that the items were responded according to the expected pattern, showing good adjustment, as well. Regarding DIF, only three items were differentiated by gender. Based on the data, results indicated a balance in the answers given by boys and girls.

  4. Status of the Review of Electric Items in Spain Related to the Post-Fukushima Stress Test Programme

    International Nuclear Information System (INIS)

    Martinez Moreno, Manuel R.; Perez Rodriguez, Alfonso

    2015-01-01

    Spain Authorities has established a comprehensive compilation of the actions currently related to the post-Fukushima program. It has been initiated both at national and international level and it is developed in an Action Plan. This Plan is aligned to the 6 topics identified in the August 2012 CNS-EOM report, and organized in four parts. One of these parts is related to the loss of electrical power and with a clear objective in implemented new features on increase robustness. This program has been reinforced and the task of Electric Issues has been incremented as a consequence of this Plan. The normal tasks of the Electric Systems and I and C Branch will be presented with the Fukushima related issues as well. The Consejo de Seguridad Nuclear -CSN-(Nuclear Safety Council) maintains a permanent program of control and surveillance of nuclear safety issues in Spanish Nuclear Power Plants. The Electric Systems and I and C Branch of the CSN have different tasks related Electric Issues: - Inspection, control and evaluation of different topics in normal and accidents operation. - Surveillance Testing Inspections. - Design Modifications Inspections and evaluation. - Reactive inspections - Other activities: Participation in Escered project (a before Fukushima Accident) with an objective of analyzed exterior grid stability and check that electric faults in the NPPs vicinity did not cause the simultaneous loss of the offsite supplies fault effects with interaction in inner related systems. Other task related with the management of aging and long-term operation. Now, as a consequence, it has been incremented its task with some new Fukushima related topics: - Analysis of beyond accident related with U.S. SBO Rule (Reg. Guide 1.155) is a part of the design bases for the Spanish plants designed by Westinghouse/ General Electric; switchyard/grid events and extreme weather events are considered, with 10 minutes to connect an alternate source (if provided; if not, use of d

  5. Towards an authoring system for item construction

    NARCIS (Netherlands)

    Rikers, Jos H.A.N.

    1988-01-01

    The process of writing test items is analyzed, and a blueprint is presented for an authoring system for test item writing to reduce invalidity and to structure the process of item writing. The developmental methodology is introduced, and the first steps in the process are reported. A historical

  6. Using Cognitive Testing to Develop Items for Surveying Asian American Cancer Patients and Their Caregivers as a Pathway to Culturally Competent Care.

    Science.gov (United States)

    Bolcic-Jankovic, Dragana; Lu, Fengxin; Colten, Mary Ellen; McCarthy, Ellen P

    2016-02-01

    We report the results from cognitive interviews with Asian American patients and their caregivers. We interviewed seven caregivers and six patients who were all bilingual Asian Americans. The main goal of the cognitive interviews was to test a survey instrument developed for a study about perspectives of Asian American patients with advanced cancer who are facing decisions around end-of-life care. We were particularly interested to see whether items commonly used in White and Black populations are culturally meaningful and equivalent in Asian populations, primarily those of Chinese and Vietnamese ethnicity. Our exploration shows that understanding respondents' language proficiency, degree of acculturation, and cultural context of receiving, processing, and communicating information about medical care can help design questions that are appropriate for Asian American patients and caregivers, and therefore can help researchers obtain quality data about the care Asian American cancer patients receive. © The Author(s) 2016.

  7. Furnace System Testing to Support Lower-Temperature Stabilization of High Chloride Plutonium Oxide Items at the Hanford Plutonium Finishing Plant

    International Nuclear Information System (INIS)

    Schmidt, Andrew J.; Gerber, Mark A.; Fischer, Christopher M.; Elmore, Monte R.

    2003-01-01

    High chloride content plutonium (HCP) oxides are impure plutonium oxide scrap which contains NaCl, KCl, MgCl2 and/or CaCl2 salts at potentially high concentrations and must be stabilized at 950 C per the DOE Standard, DOE-STD-3013-2000. The chlorides pose challenges to stabilization because volatile chloride salts and decomposition products can corrode furnace heating elements and downstream ventilation components. Thermal stabilization of HCP items at 750 C (without water washing) is being investigated as an alternative method for meeting the intent of DOE STD 3013-2000. This report presents the results from a series of furnace tests conducted to develop material balance and system operability data for supporting the evaluation of lower-temperature thermal stabilization

  8. Fuel rod simulator effects in flooding experiments single rod tests

    International Nuclear Information System (INIS)

    Nishida, M.

    1984-09-01

    The influence of a gas filled gap between cladding and pellet on the quenching behavior of a PWR fuel rod during the reflood phase of a LOCA has been investigated. Flooding experiments were conducted with a short length electrically heated single fuel rod simulator surrounded by glass housing. The gap of 0.05 mm width between the Zircaloy cladding and the internal Al 2 O 3 pellets of the rod was filled either wit helium or with argon to vary the radial heat resistance across the gap. This report presents some typical data and an evaluation of the reflood behavior of the fuel rod simulator used. The results show that the quench front propagates faster for increasing heat resistance in the gap between cladding and heat source of the rod. (orig.) [de

  9. The diagnostic odds ratio: a single indicator of test performance

    NARCIS (Netherlands)

    Glas, Afina S.; Lijmer, Jeroen G.; Prins, Martin H.; Bonsel, Gouke J.; Bossuyt, Patrick M. M.

    2003-01-01

    Diagnostic testing can be used to discriminate subjects with a target disorder from subjects without it. Several indicators of diagnostic performance have been proposed, such as sensitivity and specificity. Using paired indicators can be a disadvantage in comparing the performance of competing

  10. Subcooler assembly for SSC single magnet test program

    International Nuclear Information System (INIS)

    Wu, K.C.; Brown, D.P.; Sondericker, J.H.; Farah, Y.; Zantopp, D.; Nicoletti, A.

    1991-01-01

    A subcooler assembly has been designed, constructed and installed in the MAGCOOL magnet test area at Brookhaven National Laboratory. Since July 1989, it has been used for testing SSC magnets. This subcooler assembly and cryogenic system are the first of its kind ever built. Today, with more than 5000 hours of operating time, the subcooler has proved to be a reliable unit with individual components meeting design expectations. The lowest temperatures achieved with one SSC dipole are 3.0 K at the suction of the cold vacuum pump and 3.2 K at the return of the magnet. The system performs well in both steady state operation and during magnet quench, subcooling, cooldown and warmup. 4 refs., 7 figs

  11. Investigating Robustness of Item Response Theory Proficiency Estimators to Atypical Response Behaviors under Two-Stage Multistage Testing. ETS GRE® Board Research Report. ETS GRE®-16-03. ETS Research Report No. RR-16-22

    Science.gov (United States)

    Kim, Sooyeon; Moses, Tim

    2016-01-01

    The purpose of this study is to evaluate the extent to which item response theory (IRT) proficiency estimation methods are robust to the presence of aberrant responses under the "GRE"® General Test multistage adaptive testing (MST) design. To that end, a wide range of atypical response behaviors affecting as much as 10% of the test items…

  12. Dynamic tensile test of single PET textile cables

    Directory of Open Access Journals (Sweden)

    Pasco F.

    2012-08-01

    Full Text Available The tyres conception involves for certain applications, the use of textile cables as reinforcement. During its use, the tyre undergoes temperatures variations and dynamic loading rates. The consideration of these conditions during the numeric simulations requires the knowledge of the sensitivity of the mechanical behaviour to loading rate and temperature. In this paper, we developed an experimental methodology for testing textile cable up to high strain rate. The main difficulty of testing cables is the optimization of cable fixing on the machine. For that purpose, we adapted the solution of fixing by progressive binding already used in quasi-static, while taking into account constraints inherent to high strain tests. Firstly, the mass of grips was decreased in order to get force signal less sensitive to grips inertia. The method was developed on a high speed hydraulic machine equipped with a thermal enclosure. The investigated temperatures and strain rates range from room temperature to 373 ∘K (100 ∘C and from 0,01 to 100/s, respectively. In addition, the hydraulic machine was equipped with a high speed video camera. The obtained images were analysed by a tracking technique to measure the average strain in the cable (from 50 to 20000 f/s.

  13. Measuring outcomes in allergic rhinitis: psychometric characteristics of a Spanish version of the congestion quantifier seven-item test (CQ7

    Directory of Open Access Journals (Sweden)

    Mullol Joaquim

    2011-03-01

    Full Text Available Abstract Background No control tools for nasal congestion (NC are currently available in Spanish. This study aimed to adapt and validate the Congestion Quantifier Seven Item Test (CQ7 for Spain. Methods CQ7 was adapted from English following international guidelines. The instrument was validated in an observational, prospective study in allergic rhinitis patients with NC (N = 166 and a control group without NC (N = 35. Participants completed the CQ7, MOS sleep questionnaire, and a measure of psychological well-being (PGWBI. Clinical data included NC severity rating, acoustic rhinometry, and total symptom score (TSS. Internal consistency was assessed using Cronbach's alpha and test-retest reliability using the intraclass correlation coefficient (ICC. Construct validity was tested by examining correlations with other outcome measures and ability to discriminate between groups classified by NC severity. Sensitivity and specificity were assessed using Area under the Receiver Operating Curve (AUC and responsiveness over time using effect sizes (ES. Results Cronbach's alpha for the CQ7 was 0.92, and the ICC was 0.81, indicating good reliability. CQ7 correlated most strongly with the TSS (r = 0.60, p Conclusions The Spanish version of the CQ7 is appropriate for detecting, measuring, and monitoring NC in allergic rhinitis patients.

  14. Injection molded nanofluidic chips: Fabrication method and functional tests using single-molecule DNA experiments

    DEFF Research Database (Denmark)

    Utko, Pawel; Persson, Karl Fredrik; Kristensen, Anders

    2011-01-01

    We demonstrate that fabrication of nanofluidic systems can be greatly simplified by injection molding of polymers. We functionally test our devices by single-molecule DNA experiments in nanochannels.......We demonstrate that fabrication of nanofluidic systems can be greatly simplified by injection molding of polymers. We functionally test our devices by single-molecule DNA experiments in nanochannels....

  15. Teste de Raciocínio Auditivo Musical (RAu: estudo inicial por meio da Teoria de Reposta ao Item Test de Raciocinio Auditivo Musical (RAu: estudio inicial a través de la Teoría de Repuesta al Ítem Auditory Musical Reasoning Test: an initial study with Item Response Theory

    Directory of Open Access Journals (Sweden)

    Fernando Pessotto

    2012-12-01

    Full Text Available A presente pesquisa tem como objetivo buscar evidências de validade com base na estrutura interna e de critério para um instrumento de avaliação do processamento auditivo das habilidades musicais (Teste de Processamento Auditivo com Estímulos Musicais, RAu. Para tanto, foram avaliadas 162 pessoas de ambos os sexos, sendo 56,8% homens, com faixa etária entre 15 e 59 anos (M=27,5; DP=9,01. Os participantes foram divididos entre músicos (N=24, amadores (N=62 e leigos (N=76, de acordo com o nível de conhecimento em música. Por meio da análise Full Information Factor Analysis, verificou-se a dimensionalidade do instrumento, e também as propriedades dos itens, por meio da Teoria de Resposta ao Item (TRI. Além disso, buscou-se identificar a capacidade de discriminação entre os grupos de músicos e não-músicos. Os dados encontrados apontam evidências de que os itens medem uma dimensão principal (alfa=0,92 com alta capacidade para diferenciar os grupos de músicos profissionais, amadores e leigos, obtendo-se um coeficiente de validade de critério de r=0,68. Os resultado indicam evidências positivas de precisão e validade para o RAu.La presente investigación tiene como objetivo buscar evidencias de validez basadas en la estructura interna y de criterio para un instrumento de evaluación del procesamiento auditivo de las habilidades musicales (Test de Procesamiento Auditivo con Estímulos Musicales, RAu. Para eso, fueron evaluadas 162 personas de ambos los sexos, siendo 56,8% hombres, con rango de edad entre 15 y 59 años (M=27,5; DP=9,01. Los participantes fueron divididos entre músicos (N=24, aficionados (N=62 y laicos (N=76 de acuerdo con el nivel de conocimiento en música. Por medio del análisis Full Information Factor Analysis se verificó la dimensionalidad del instrumento y también las propiedades de los ítems a través de la Teoría de Respuesta al Ítem (TRI. Además, se buscó identificar la capacidad de discriminaci

  16. Single trapped cold ions: a testing ground for quantum mechanics

    International Nuclear Information System (INIS)

    Maniscalco, S

    2005-01-01

    In this article I review some results obtained during my PhD work in the group of Professor Messina, at the University of Palermo. I discuss some proposals aimed at exploring fundamental issues of quantum theory, e.g. entanglement and quantum superpositions, in the context of single trapped ions. This physical context turns out to be extremely well suited both for studying fundamental features of quantum mechanics, such as the quantum-classical border, and for technological applications such as quantum logic gates and quantum registers. I focus on some procedures for engineering nonclassical states of the vibrational motion of the centre of mass of the ion. I consider both the case in which the ion interacts with classical laser beams and the case of interaction with a quantized mode of light. In particular, I discuss the generation of Schroedinger cat-like states, Bell states and Greenberger-Horn-Zeilinger states. The schemes for generating nonclassical states stem from two different quantum processes: the parity effect and the quantum state manipulation via quantum non-demolition measurement. Finally, I consider a microscopic theory of the interaction of a quantum harmonic oscillator (the centre of mass of the ion in the trapped ion context) with a bosonic thermal environment. Using an exact approach to the dynamics, I discuss a quantum theory of heating of trapped ions able to describe both the short time non-Markovian regime and the thermalization process. I conclude showing briefly how the trapped ion systems can be used as simulators of key models of open quantum systems such as the Caldeira-Leggett model. (phd tutorial)

  17. Concurrent Validation of the Clinical Opiate Withdrawal Scale (COWS) and Single-Item Indices against the Clinical Institute Narcotic Assessment (CINA) Opioid Withdrawal Instrument

    Science.gov (United States)

    Tompkins, D. Andrew; Bigelow, George E.; Harrison, Joseph A.; Johnson, Rolley E.; Fudala, Paul J.; Strain, Eric C.

    2009-01-01

    Introduction The Clinical Opiate Withdrawal Scale (COWS) is an 11-item clinician-administered scale assessing opioid withdrawal. Though commonly used in clinical practice, it has not been systematically validated. The present study validated the COWS in comparison to the validated Clinical Institute Narcotic Assessment (CINA) scale. Method Opioid-dependent volunteers were enrolled in a residential trial and stabilized on morphine 30 mg given subcutaneously four times daily. Subjects then underwent double-blind, randomized challenges of intramuscularly administered placebo and naloxone (0.4 mg) on separate days, during which the COWS, CINA, and visual analog scale (VAS) assessments were concurrently obtained. Subjects completing both challenges were included (N=46). Correlations between mean peak COWS and CINA scores as well as self-report VAS questions were calculated. Results Mean peak COWS and CINA scores of 7.6 and 24.4, respectively, occurred on average 30 minutes post-injection of naloxone. Mean COWS and CINA scores 30 minutes after placebo injection were 1.3 and 18.9, respectively. The Pearson correlation coefficient for peak COWS and CINA scores during the naloxone challenge session was 0.85 (p<0.001). Peak COWS scores also correlated well with peak VAS self-report scores of bad drug effect (r=0.57, p<0.001) and feeling sick (r=0.57, p<0.001), providing additional evidence of concurrent validity. Placebo was not associated with any significant elevation of COWS, CINA, or VAS scores, indicating discriminant validity. Cronbach’s alpha for the COWS was 0.78, indicating good internal consistency (reliability). Discussion COWS, CINA, and certain VAS items are all valid measurement tools for acute opiate withdrawal. PMID:19647958

  18. A Review of Classical Methods of Item Analysis.

    Science.gov (United States)

    French, Christine L.

    Item analysis is a very important consideration in the test development process. It is a statistical procedure to analyze test items that combines methods used to evaluate the important characteristics of test items, such as difficulty, discrimination, and distractibility of the items in a test. This paper reviews some of the classical methods for…

  19. Validation of the Ten-Item Internet Gaming Disorder Test (IGDT-10) and evaluation of the nine DSM-5 Internet Gaming Disorder criteria.

    Science.gov (United States)

    Király, Orsolya; Sleczka, Pawel; Pontes, Halley M; Urbán, Róbert; Griffiths, Mark D; Demetrovics, Zsolt

    2017-01-01

    The inclusion of Internet Gaming Disorder (IGD) in the DSM-5 (Section 3) has given rise to much scholarly debate regarding the proposed criteria and their operationalization. The present study's aim was threefold: to (i) develop and validate a brief psychometric instrument (Ten-Item Internet Gaming Disorder Test; IGDT-10) to assess IGD using definitions suggested in DSM-5, (ii) contribute to ongoing debate regards the usefulness and validity of each of the nine IGD criteria (using Item Response Theory [IRT]), and (iii) investigate the cut-off threshold suggested in the DSM-5. An online gamer sample of 4887 gamers (age range 14-64years, mean age 22.2years [SD=6.4], 92.5% male) was collected through Facebook and a gaming-related website with the cooperation of a popular Hungarian gaming magazine. A shopping voucher of approx. 300 Euros was drawn between participants to boost participation (i.e., lottery incentive). Confirmatory factor analysis and a structural regression model were used to test the psychometric properties of the IGDT-10 and IRT analysis was conducted to test the measurement performance of the nine IGD criteria. Finally, Latent Class Analysis along with sensitivity and specificity analysis were used to investigate the cut-off threshold proposed in the DSM-5. Analysis supported IGDT-10's validity, reliability, and suitability to be used in future research. Findings of the IRT analysis suggest IGD is manifested through a different set of symptoms depending on the level of severity of the disorder. More specifically, "continuation", "preoccupation", "negative consequences" and "escape" were associated with lower severity of IGD, while "tolerance", "loss of control", "giving up other activities" and "deception" criteria were associated with more severe levels. "Preoccupation" and "escape" provided very little information to the estimation IGD severity. Finally, the DSM-5 suggested threshold appeared to be supported by our statistical analyses. IGDT-10 is

  20. Assessing the test-retest repeatability of the Vietnamese version of the National Eye Institute 25-item Visual Function Questionnaire among bilateral cataract patients for a Vietnamese population.

    Science.gov (United States)

    To, Kien Gia; Meuleners, Lynn; Chen, Huei-Yang; Lee, Andy; Do, Dung Van; Duong, Dat Van; Phi, Tien Duy; Tran, Hoang Huy; Nguyen, Nguyen Do

    2014-06-01

    To determine the test-retest repeatability of the National Eye Institute 25-item Visual Function Questionnaire (NEI VFQ-25) for use with older Vietnamese adults with bilateral cataract. The questionnaire was translated into Vietnamese and back-translated into English by two independent translators. Patients with bilateral cataract aged 50 and older completed the questionnaire on two separate occasions, one to two weeks after first administration of the questionnaire. Test-retest repeatability was assessed using the Cronbach's α and intraclass correlation coefficients. The average age of participants was 67 ± 8 years and most participants were female (73%). Internal consistency was acceptable with the α coefficient above 0.7 for all subscales and intraclass correlation coefficients were 0.6 or greater in all subscales. The Vietnamese NEI VFQ-25 is reliable for use in studies assessing vision-related quality of life in older adults with bilateral cataract in Vietnam. We propose some modifications to the NEI-VFQ questions to reflect activities of older people in Vietnam. © 2013 ACOTA.

  1. Test procedures and instructions for single shell tank saltcake cesium removal with crystalline silicotitanate

    Energy Technology Data Exchange (ETDEWEB)

    Duncan, J.B.

    1997-01-07

    This document provides specific test procedures and instructions to implement the test plan for the preparation and conduct of a cesium removal test, using Hanford Single Shell Tank Saltcake from tanks 24 t -BY- I 10, 24 1 -U- 108, 24 1 -U- 109, 24 1 -A- I 0 1, and 24 t - S-102, in a bench-scale column. The cesium sorbent to be tested is crystalline siticotitanate. The test plan for which this provides instructions is WHC-SD-RE-TP-024, Hanford Single Shell Tank Saltcake Cesium Removal Test Plan.

  2. Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions.

    Science.gov (United States)

    Haberman, Shelby J; Sinharay, Sandip; Chon, Kyong Hee

    2013-07-01

    Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.

  3. NEPP Update of Independent Single Event Upset Field Programmable Gate Array Testing

    Science.gov (United States)

    Berg, Melanie; Label, Kenneth; Campola, Michael; Pellish, Jonathan

    2017-01-01

    This presentation provides a NASA Electronic Parts and Packaging (NEPP) Program update of independent Single Event Upset (SEU) Field Programmable Gate Array (FPGA) testing including FPGA test guidelines, Microsemi RTG4 heavy-ion results, Xilinx Kintex-UltraScale heavy-ion results, Xilinx UltraScale+ single event effect (SEE) test plans, development of a new methodology for characterizing SEU system response, and NEPP involvement with FPGA security and trust.

  4. Dutch translation and cross-cultural adaptation of the PROMIS® physical function item bank and cognitive pre-test in Dutch arthritis patients.

    Science.gov (United States)

    Oude Voshaar, Martijn Ah; Ten Klooster, Peter M; Taal, Erik; Krishnan, Eswar; van de Laar, Mart Afj

    2012-03-05

    Patient-reported physical function is an established outcome domain in clinical studies in rheumatology. To overcome the limitations of the current generation of questionnaires, the Patient-Reported Outcomes Measurement Information System (PROMIS®) project in the USA has developed calibrated item banks for measuring several domains of health status in people with a wide range of chronic diseases. The aim of this study was to translate and cross-culturally adapt the PROMIS physical function item bank to the Dutch language and to pretest it in a sample of patients with arthritis. The items of the PROMIS physical function item bank were translated using rigorous forward-backward protocols and the translated version was subsequently cognitively pretested in a sample of Dutch patients with rheumatoid arthritis. Few issues were encountered in the forward-backward translation. Only 5 of the 124 items to be translated had to be rewritten because of culturally inappropriate content. Subsequent pretesting showed that overall, questions of the Dutch version were understood as they were intended, while only one item required rewriting. Results suggest that the translated version of the PROMIS physical function item bank is semantically and conceptually equivalent to the original. Future work will be directed at creating a Dutch-Flemish final version of the item bank to be used in research with Dutch speaking populations.

  5. Coeducational or Single-Sex Schools? A Review of the Literature. New Zealand Council for Educational Research, Set 76, Number 1 Item 9.

    Science.gov (United States)

    Irving, James

    This article is part of an informational kit for teachers published by the New Zealand Council for Educational Research. The focus of this article is on the advantages and disadvantages of co-educational and single-sex secondary schools as discussed in research efforts from England and New Zealand. (JLL)

  6. A Balance Sheet for Educational Item Banking.

    Science.gov (United States)

    Hiscox, Michael D.

    Educational item banking presents observers with a considerable paradox. The development of test items from scratch is viewed as wasteful, a luxury in times of declining resources. On the other hand, item banking has failed to become a mature technology despite large amounts of money and the efforts of talented professionals. The question of which…

  7. Definition of Capabilities Needed for a Single Event Effects Test Facility

    International Nuclear Information System (INIS)

    Riemer, Bernie; Gallmeier, Franz X.

    2014-01-01

    The Federal Aviation Administration (FAA) is contemplating new regulations mandating testing of the vulnerability of flight-critical avionics to single event effects (SEE). A limited number of high-energy neutron test facilities currently serve the SEE industrial and institutional research community. The FAA recognizes that existing facilities have insufficient test capacity to meet new demand from such mandates; it desires more flexible irradiation capabilities to test complete, large systems and would like capabilities to address greater concerns for thermal neutrons. For this reason, the FAA funded this study by Spallation Neutron Source (SNS) staff with the ultimate aim of developing options for SEE test facilities using high-energy neutrons at the SNS complex. After an investigation of current SEE test practices and assessment of future testing requirements, three concepts were identified covering a range of test functionality, neutron flux levels, and fidelity to the atmospheric neutron spectrum. The costs and times required to complete each facility were also estimated. SEE testing is generally performed by accelerating the event rate to a point where the effects are still dominated by single events and double event causes of failures are negligible. In practice, acceleration factors of as high as 10 6 are applicable for component testing, whereas for systems testing acceleration factors of 10 4 seem to be the upper limit. It is strongly desirable that the irradiation facility be tunable over a large range of high-energy neutron fluxes of 10 2 - 10 4 n/cm 2 /s for systems testing and from 10 4 - 10 7 n/cm 2 /s for components testing. The most capable, most flexible, and highest-test-capacity option is a new stand-alone target station named the High-Energy neutron Test Station (HETS). It is also the most expensive option, with a cost to complete of approximately $100 million. Dual test enclosures would allow for simultaneous testing activity effectively

  8. Definition of Capabilities Needed for a Single Event Effects Test Facility

    Energy Technology Data Exchange (ETDEWEB)

    Riemer, Bernie [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Spallation Neutron Source (SNS); Gallmeier, Franz X. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Spallation Neutron Source (SNS)

    2014-12-01

    The Federal Aviation Administration (FAA) is contemplating new regulations mandating testing of the vulnerability of flight-critical avionics to single event effects (SEE). A limited number of high-energy neutron test facilities currently serve the SEE industrial and institutional research community. The FAA recognizes that existing facilities have insufficient test capacity to meet new demand from such mandates; it desires more flexible irradiation capabilities to test complete, large systems and would like capabilities to address greater concerns for thermal neutrons. For this reason, the FAA funded this study by Spallation Neutron Source (SNS) staff with the ultimate aim of developing options for SEE test facilities using high-energy neutrons at the SNS complex. After an investigation of current SEE test practices and assessment of future testing requirements, three concepts were identified covering a range of test functionality, neutron flux levels, and fidelity to the atmospheric neutron spectrum. The costs and times required to complete each facility were also estimated. SEE testing is generally performed by accelerating the event rate to a point where the effects are still dominated by single events and double event causes of failures are negligible. In practice, acceleration factors of as high as 106 are applicable for component testing, whereas for systems testing acceleration factors of 104 seem to be the upper limit. It is strongly desirable that the irradiation facility be tunable over a large range of high-energy neutron fluxes of 102 - 104 n/cm²/s for systems testing and from 104 - 107 n/cm²/s for components testing. The most capable, most flexible, and highest-test-capacity option is a new stand-alone target station named the High-Energy neutron Test Station (HETS). It is also the most expensive option, with a cost to complete of approximately $100 million. Dual test enclosures would

  9. Item response theory analyses of the Delis-Kaplan Executive Function System card sorting subtest.

    Science.gov (United States)

    Spencer, Mercedes; Cho, Sun-Joo; Cutting, Laurie E

    2018-02-02

    In the current study, we examined the dimensionality of the 16-item Card Sorting subtest of the Delis-Kaplan Executive Functioning System assessment in a sample of 264 native English-speaking children between the ages of 9 and 15 years. We also tested for measurement invariance for these items across age and gender groups using item response theory (IRT). Results of the exploratory factor analysis indicated that a two-factor model that distinguished between verbal and perceptual items provided the best fit to the data. Although the items demonstrated measurement invariance across age groups, measurement invariance was violated for gender groups, with two items demonstrating differential item functioning for males and females. Multigroup analysis using all 16 items indicated that the items were more effective for individuals whose IRT scale scores were relatively high. A single-group explanatory IRT model using 14 non-differential item functioning items showed that for perceptual ability, females scored higher than males and that scores increased with age for both males and females; for verbal ability, the observed increase in scores across age differed for males and females. The implications of these findings are discussed.

  10. New decision criteria for selecting delta check methods based on the ratio of the delta difference to the width of the reference range can be generally applicable for each clinical chemistry test item.

    Science.gov (United States)

    Park, Sang Hyuk; Kim, So-Young; Lee, Woochang; Chun, Sail; Min, Won-Ki

    2012-09-01

    Many laboratories use 4 delta check methods: delta difference, delta percent change, rate difference, and rate percent change. However, guidelines regarding decision criteria for selecting delta check methods have not yet been provided. We present new decision criteria for selecting delta check methods for each clinical chemistry test item. We collected 811,920 and 669,750 paired (present and previous) test results for 27 clinical chemistry test items from inpatients and outpatients, respectively. We devised new decision criteria for the selection of delta check methods based on the ratio of the delta difference to the width of the reference range (DD/RR). Delta check methods based on these criteria were compared with those based on the CV% of the absolute delta difference (ADD) as well as those reported in 2 previous studies. The delta check methods suggested by new decision criteria based on the DD/RR ratio corresponded well with those based on the CV% of the ADD except for only 2 items each in inpatients and outpatients. Delta check methods based on the DD/RR ratio also corresponded with those suggested in the 2 previous studies, except for 1 and 7 items in inpatients and outpatients, respectively. The DD/RR method appears to yield more feasible and intuitive selection criteria and can easily explain changes in the results by reflecting both the biological variation of the test item and the clinical characteristics of patients in each laboratory. We suggest this as a measure to determine delta check methods.

  11. Adaptive screening for depression--recalibration of an item bank for the assessment of depression in persons with mental and somatic diseases and evaluation in a simulated computer-adaptive test environment.

    Science.gov (United States)

    Forkmann, Thomas; Kroehne, Ulf; Wirtz, Markus; Norra, Christine; Baumeister, Harald; Gauggel, Siegfried; Elhan, Atilla Halil; Tennant, Alan; Boecker, Maren

    2013-11-01

    This study conducted a simulation study for computer-adaptive testing based on the Aachen Depression Item Bank (ADIB), which was developed for the assessment of depression in persons with somatic diseases. Prior to computer-adaptive test simulation, the ADIB was newly calibrated. Recalibration was performed in a sample of 161 patients treated for a depressive syndrome, 103 patients from cardiology, and 103 patients from otorhinolaryngology (mean age 44.1, SD=14.0; 44.7% female) and was cross-validated in a sample of 117 patients undergoing rehabilitation for cardiac diseases (mean age 58.4, SD=10.5; 24.8% women). Unidimensionality of the itembank was checked and a Rasch analysis was performed that evaluated local dependency (LD), differential item functioning (DIF), item fit and reliability. CAT-simulation was conducted with the total sample and additional simulated data. Recalibration resulted in a strictly unidimensional item bank with 36 items, showing good Rasch model fit (item fit residualsLD. CAT simulation revealed that 13 items on average were necessary to estimate depression in the range of -2 and +2 logits when terminating at SE≤0.32 and 4 items if using SE≤0.50. Receiver Operating Characteristics analysis showed that θ estimates based on the CAT algorithm have good criterion validity with regard to depression diagnoses (Area Under the Curve≥.78 for all cut-off criteria). The recalibration of the ADIB succeeded and the simulation studies conducted suggest that it has good screening performance in the samples investigated and that it may reasonably add to the improvement of depression assessment. © 2013.

  12. Verification of Differential Item Functioning (DIF) Status of West ...

    African Journals Online (AJOL)

    This study investigated test item bias and Differential Item Functioning (DIF) of West African ... items in chemistry function differentially with respect to gender and location. In Aba education zone of Abia, 50 secondary schools were purposively ...

  13. Results of single borehole hydraulic testing in the Mizunami Underground Research Laboratory project. Phase 2

    International Nuclear Information System (INIS)

    Daimaru, Shuji; Takeuchi, Ryuji; Onoe, Hironori; Saegusa, Hiromitsu

    2012-09-01

    This report summarize the results of the single borehole hydraulic tests of 79 sections conducted as part of the Construction phase (Phase 2) in the Mizunami Underground Research Laboratory (MIU) Project. The details of each test (test interval depth, geology, etc.) as well as the interpreted hydraulic parameters and analytical method used are presented in this report. (author)

  14. A strategy for optimizing item-pool management

    NARCIS (Netherlands)

    Ariel, A.; van der Linden, Willem J.; Veldkamp, Bernard P.

    2006-01-01

    Item-pool management requires a balancing act between the input of new items into the pool and the output of tests assembled from it. A strategy for optimizing item-pool management is presented that is based on the idea of a periodic update of an optimal blueprint for the item pool to tune item

  15. German Children’s Use of Word Order and Case Marking to Interpret Simple and Complex Sentences: Testing Differences Between Constructions and Lexical Items

    Science.gov (United States)

    Brandt, Silke; Lieven, Elena; Tomasello, Michael

    2016-01-01

    ABSTRACT Children and adults follow cues such as case marking and word order in their assignment of semantic roles in simple transitives (e.g., the dog chased the cat). It has been suggested that the same cues are used for the interpretation of complex sentences, such as transitive relative clauses (RCs) (e.g., that’s the dog that chased the cat) (Bates, Devescovi, & D’Amico, 1999). We used a pointing paradigm to test German-speaking 3-, 4-, and 6-year-old children’s sensitivity to case marking and word order in their interpretation of simple transitives and transitive RCs. In Experiment 1, case marking was ambiguous. The only cue available was word order. In Experiment 2, case was marked on lexical NPs or demonstrative pronouns. In Experiment 3, case was marked on lexical NPs or personal pronouns. Whereas the younger children mainly followed word order, the older children were more likely to base their interpretations on the more reliable case-marking cue. In most cases, children from both age groups were more likely to use these cues in their interpretation of simple transitives than in their interpretation of transitive RCs. Finally, children paid more attention to nominative case when it was marked on first-person personal pronouns than when it was marked on third-person lexical NPs or demonstrative pronouns, such as der Löwe ‘the-NOM lion’ or der ‘he-NOM.’ They were able to successfully integrate this case-marking cue in their sentence processing even when it appeared late in the sentence. We discuss four potential reasons for these differences across development, constructions, and lexical items. (1) Older children are relatively more sensitive to cue reliability. (2) Word order is more reliable in simple transitives than in transitive RCs. (3) The processing of case marking might initially be item-specific. (4) The processing of case marking might depend on its saliency and position in the sentence. PMID:27019652

  16. NASA Electronic Parts and Packaging Field Programmable Gate Array Single Event Effects Test Guideline Update

    Science.gov (United States)

    Berg, Melanie D.; LaBel, Kenneth A.

    2018-01-01

    The following are updated or new subjects added to the FPGA SEE Test Guidelines manual: academic versus mission specific device evaluation, single event latch-up (SEL) test and analysis, SEE response visibility enhancement during radiation testing, mitigation evaluation (embedded and user-implemented), unreliable design and its affects to SEE Data, testing flushable architectures versus non-flushable architectures, intellectual property core (IP Core) test and evaluation (addresses embedded and user-inserted), heavy-ion energy and linear energy transfer (LET) selection, proton versus heavy-ion testing, fault injection, mean fluence to failure analysis, and mission specific system-level single event upset (SEU) response prediction. Most sections within the guidelines manual provide information regarding best practices for test structure and test system development. The scope of this manual addresses academic versus mission specific device evaluation and visibility enhancement in IP Core testing.

  17. UN ANÁLISIS NO PARAMÉTRICO DE ÍTEMS DE LA PRUEBA DEL BENDER/A NONPARAMETRIC ITEM ANALYSIS OF THE BENDER GESTALT TEST MODIFIED

    Directory of Open Access Journals (Sweden)

    César Merino Soto

    2009-05-01

    Full Text Available Resumen:La presente investigación hace un estudio psicométrico de un nuevo sistema de calificación de la Prueba Gestáltica del Bendermodificada para niños, que es el Sistema de Calificación Cualitativa (Brannigan y Brunner, 2002, en un muestra de 244 niñosingresantes a primer grado de primaria en cuatro colegios públicos, ubicados en Lima. El enfoque usado es un análisis noparamétrico de ítems mediante el programa Testgraf (Ramsay, 1991. Los resultados indican niveles apropiados deconsistencia interna, identificándose la unidimensionalidad, y el buen nivel discriminativo de las categorías de calificación deeste Sistema Cualitativo. No se hallaron diferencias demográficas respecto al género ni la edad. Se discuten los presenteshallazgos en el contexto del potencial uso del Sistema de Calificación Cualitativa y del análisis no paramétrico de ítems en lainvestigación psicométrica.AbstracThis research designs a psychometric study of a new scoring system of the Bender Gestalt test modified to children: it is theQualitative Scoring System (Brannigan & Brunner, 2002, in a sample of 244 first grade children of primary level, in four public school of Lima. The approach aplied is the nonparametric item analysis using The test graft computer program (Ramsay, 1991. Our findings point to good levels of internal consistency, unidimensionality and good discriminative level ofthe categories of scoring from the Qualitative Scoring System. There are not demographic differences between gender or age.We discuss our findings within the context of the potential use of the Qualitative Scoring System and of the nonparametricitem analysis approach in the psychometric research.

  18. Forecast of thermal-hydrological conditions and air injection test results of the single heater test at Yucca Mountain

    International Nuclear Information System (INIS)

    Birkholzer, J.T.; Tsang, Y.W.

    1996-12-01

    The heater in the Single Heater Test (SHT) in alcove 5 of the Exploratory Studies Facility (ESF) was turned on August 26, 1996. A large number of sensors are installed in the various instrumented boreholes to monitor the coupled thermal-hydrological-mechanical-chemical responses of the rock mass to the heat generated in the single heater. In this report the authors present the results of the modeling of both the heating and cooling phases of the Single Heater Test (SHT), with focus on the thermal-hydrological aspect of the coupled processes. Also in this report, the authors present simulations of air injection tests will be performed at different stages of the heating and cooling phase of the SHT

  19. Determining the Sensitivity of CAT-ASVAB (Computerized Adaptive Testing- Armed Services Vocational Aptitude Battery) Scores to Changes in Item Response Curves with the Medium of Administration

    Science.gov (United States)

    1986-08-01

    most examinees. Therefore it appears psychometrically ac - ceptable for the CAT -ASVAB project to proceed without item recalibration based on...MEMORANDUM DETERMINING THE SENSITIVITY OF CAT -ASVAB SCORES TO CHANGES IN ITEM RESPONSE CURVES WITH THE MEDIUM OF ADMINISTRATION D. R. Divgi...Subj: Center for Naval Analyses Research Memorandum 86-189 End: (1) CNA Research Memorandum 86-189, "Determining the Sensitivity of CAT -ASVAB

  20. Intersection tests for single marker QTL analysis can be more powerful than two marker QTL analysis

    Directory of Open Access Journals (Sweden)

    Doerge RW

    2003-06-01

    Full Text Available Abstract Background It has been reported in the quantitative trait locus (QTL literature that when testing for QTL location and effect, the statistical power supporting methodologies based on two markers and their estimated genetic map is higher than for the genetic map independent methodologies known as single marker analyses. Close examination of these reports reveals that the two marker approaches are more powerful than single marker analyses only in certain cases. Simulation studies are a commonly used tool to determine the behavior of test statistics under known conditions. We conducted a simulation study to assess the general behavior of an intersection test and a two marker test under a variety of conditions. The study was designed to reveal whether two marker tests are always more powerful than intersection tests, or whether there are cases when an intersection test may outperform the two marker approach. We present a reanalysis of a data set from a QTL study of ovariole number in Drosophila melanogaster. Results Our simulation study results show that there are situations where the single marker intersection test equals or outperforms the two marker test. The intersection test and the two marker test identify overlapping regions in the reanalysis of the Drosophila melanogaster data. The region identified is consistent with a regression based interval mapping analysis. Conclusion We find that the intersection test is appropriate for analysis of QTL data. This approach has the advantage of simplicity and for certain situations supplies equivalent or more powerful results than a comparable two marker test.

  1. Applying Item Response Theory methods to design a learning progression-based science assessment

    Science.gov (United States)

    Chen, Jing

    Learning progressions are used to describe how students' understanding of a topic progresses over time and to classify the progress of students into steps or levels. This study applies Item Response Theory (IRT) based methods to investigate how to design learning progression-based science assessments. The research questions of this study are: (1) how to use items in different formats to classify students into levels on the learning progression, (2) how to design a test to give good information about students' progress through the learning progression of a particular construct and (3) what characteristics of test items support their use for assessing students' levels. Data used for this study were collected from 1500 elementary and secondary school students during 2009--2010. The written assessment was developed in several formats such as the Constructed Response (CR) items, Ordered Multiple Choice (OMC) and Multiple True or False (MTF) items. The followings are the main findings from this study. The OMC, MTF and CR items might measure different components of the construct. A single construct explained most of the variance in students' performances. However, additional dimensions in terms of item format can explain certain amount of the variance in student performance. So additional dimensions need to be considered when we want to capture the differences in students' performances on different types of items targeting the understanding of the same underlying progression. Items in each item format need to be improved in certain ways to classify students more accurately into the learning progression levels. This study establishes some general steps that can be followed to design other learning progression-based tests as well. For example, first, the boundaries between levels on the IRT scale can be defined by using the means of the item thresholds across a set of good items. Second, items in multiple formats can be selected to achieve the information criterion at all

  2. The 6-Item Cognitive Impairment Test as a bedside screening for dementia in general hospital patients: results of the General Hospital Study (GHoSt).

    Science.gov (United States)

    Hessler, Johannes Baltasar; Schäufele, Martina; Hendlmeier, Ingrid; Nora Junge, Magdalena; Leonhardt, Sarah; Weber, Joshua; Bickel, Horst

    2017-07-01

    The objective of this study was to examine the psychometric quality of the 6-Item Cognitive Impairment Test (6CIT) as a bedside screening for the detection of dementia in general hospital patients. Participants (N = 1,440) were inpatients aged ≥65 of 33 randomly selected general hospitals in Southern Germany. The 6CIT was conducted at bedside, and dementia was diagnosed according to DSM-IV. Nursing staff was asked to rate the patients' cognitive status, and previous diagnoses of dementia were extracted from medical records. Completion rates and validity statistics were calculated. Two-hundred seventy patients had dementia. Cases with delirium but no dementia were excluded. Feasibility was 97.9% and 83.3% for patients without and with dementia, respectively, and decreased from moderate (93.8%) to severe dementia (53%). The area under the curve of the 6CIT was 0.98. Sensitivity, specificity, accuracy, positive predictive value, and negative predictive value were calculated for the cutoffs 7/8 (0.96, 0.82, 0.85, 0.52, 0.99) and 10/11 (0.88, 0.95, 0.94, 0.76, 0.98). The nurse ratings and medical records information had lower validity statistics. Logistic regression analyses revealed that the 6CIT statistically significantly provided information above nurse ratings and medical records. Twenty-five and 37 additional patients were correctly classified by the 7/8 and 10/11 cutoffs, respectively. The 6CIT is a feasible and valid screening tool for the detection of dementia in older general hospital patients. The 6CIT outperformed the nurse ratings of cognitive status and dementia diagnoses from medical records, suggesting that standardized screening may have benefits with regard to case finding. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  3. The comparative performance of the single intradermal test and the single intradermal comparative tuberculin test in Irish cattle, using tuberculin PPD combinations of differing potencies.

    Science.gov (United States)

    Good, M; Clegg, T A; Costello, E; More, S J

    2011-11-01

    In national bovine tuberculosis (BTB) control programmes, testing is generally conducted using a single source of bovine purified protein derivative (PPD) tuberculin. Alternative tuberculin sources should be identified as part of a broad risk management strategy as problems of supply or quality cannot be discounted. This study was conducted to compare the impact of different potencies of a single bovine PPD tuberculin on the field performance of the single intradermal comparative tuberculin test (SICTT) and single intradermal test (SIT). Three trial potencies of bovine PPD tuberculin, as assayed in naturally infected bovines, namely, low (1192IU/dose), normal (6184IU/dose) and high (12,554IU/dose) were used. Three SICTTs (using) were conducted on 2102 animals. Test results were compared based on reactor-status and changes in skin-thickness at the bovine tuberculin injection site. There was a significant difference in the number of reactors detected using the high and low potency tuberculins. In the SICTT, high and low potency tuberculin detected 40% more and 50% fewer reactors, respectively, than normal potency tuberculin. Furthermore, use of the low potency tuberculin in the SICTT failed to detect 20% of 35 animals with visible lesions, and in the SIT 11% of the visible lesion animals would have been classified as negative. Tuberculin potency is critical to the performance of both the SICTT and SIT. Tuberculin of different potencies will affect reactor disclosure rates, confounding between-year or between-country comparisons. Independent checks of tuberculin potency are an important aspect of quality control in national BTB control programmes. Copyright © 2011 Elsevier Ltd. All rights reserved.

  4. Robust inference from multiple test statistics via permutations: a better alternative to the single test statistic approach for randomized trials.

    Science.gov (United States)

    Ganju, Jitendra; Yu, Xinxin; Ma, Guoguang Julie

    2013-01-01

    Formal inference in randomized clinical trials is based on controlling the type I error rate associated with a single pre-specified statistic. The deficiency of using just one method of analysis is that it depends on assumptions that may not be met. For robust inference, we propose pre-specifying multiple test statistics and relying on the minimum p-value for testing the null hypothesis of no treatment effect. The null hypothesis associated with the various test statistics is that the treatment groups are indistinguishable. The critical value for hypothesis testing comes from permutation distributions. Rejection of the null hypothesis when the smallest p-value is less than the critical value controls the type I error rate at its designated value. Even if one of the candidate test statistics has low power, the adverse effect on the power of the minimum p-value statistic is not much. Its use is illustrated with examples. We conclude that it is better to rely on the minimum p-value rather than a single statistic particularly when that single statistic is the logrank test, because of the cost and complexity of many survival trials. Copyright © 2013 John Wiley & Sons, Ltd.

  5. Role of optometry school in single day large scale school vision testing

    Science.gov (United States)

    Anuradha, N; Ramani, Krishnakumar

    2015-01-01

    Background: School vision testing aims at identification and management of refractive errors. Large-scale school vision testing using conventional methods is time-consuming and demands a lot of chair time from the eye care professionals. A new strategy involving a school of optometry in single day large scale school vision testing is discussed. Aim: The aim was to describe a new approach of performing vision testing of school children on a large scale in a single day. Materials and Methods: A single day vision testing strategy was implemented wherein 123 members (20 teams comprising optometry students and headed by optometrists) conducted vision testing for children in 51 schools. School vision testing included basic vision screening, refraction, frame measurements, frame choice and referrals for other ocular problems. Results: A total of 12448 children were screened, among whom 420 (3.37%) were identified to have refractive errors. 28 (1.26%) children belonged to the primary, 163 to middle (9.80%), 129 (4.67%) to secondary and 100 (1.73%) to the higher secondary levels of education respectively. 265 (2.12%) children were referred for further evaluation. Conclusion: Single day large scale school vision testing can be adopted by schools of optometry to reach a higher number of children within a short span. PMID:25709271

  6. A single standard for in-place testing of DOE HEPA filters - not

    Energy Technology Data Exchange (ETDEWEB)

    Mokler, B.V. [Los Alamos National Laboratory, NM (United States)

    1995-02-01

    This article is a review of arguments against the use of a single standard for in-place testing of DOE HEPA filters. The author feels that the term `standard` entails mandatory compliance. Additionally, the author feels that the variety of DOE HEPA systems requiring in-place testing is such that the guidance for testing must be written in a permissive fashion, allowing options and alternatives. With this in mind, it is not possible to write a single document entailing mandatory compliance for all DOE facilities.

  7. Evidence of the Generalization and Construct Representation Inferences for the "GRE"® Revised General Test Sentence Equivalence Item Type. ETS GRE® Board Research Report. ETS GRE®-17-02. ETS Research Report. RR-17-05

    Science.gov (United States)

    Bejar, Isaac I.; Deane, Paul D.; Flor, Michael; Chen, Jing

    2017-01-01

    The report is the first systematic evaluation of the sentence equivalence item type introduced by the "GRE"® revised General Test. We adopt a validity framework to guide our investigation based on Kane's approach to validation whereby a hierarchy of inferences that should be documented to support score meaning and interpretation is…

  8. Diagnostic dilemma of the single screening test used in the diagnosis of syphilis in Nepal.

    Science.gov (United States)

    Dumre, S P; Shakya, G; Acharya, D; Malla, S; Adhikari, N

    2011-12-01

    Syphilis screening by the nontreponemal rapid plasma reagin (RPR) test is not usually followed up by specific treponemal tests in most of the resource poor healthcare settings of Nepal. We analyzed serum specimens of 504 suspected syphilis cases at the immunology department of the national reference laboratory in Nepal during 2007-2009 using RPR test and Treponema pallidum hemagglutination assay (TPHA). In overall, 35.7% were positive by both methods (combination) while 13.1% were RPR positive and TPHA negative, 8.7% were positive by TPHA only and 42.5% were negative by both methods. Among the RPR reactive (n = 246), 73.2% were positive by TPHA. Non-specific agglutination in RPR testing was relatively higher (26.8%) compared to TPHA (19.6%). Although TPHA was found more specific than RPR test, either of the single tests produced inaccurate diagnosis. Since the single RPR testing for syphilis may yield false positive results, specific treponemal test should be routinely used as confirmatory test to rule out false RPR positive cases. More attention needs to be paid on formulation of strict policy on the implementation of the existing guidelines throughout the country to prevent misdiagnosis in syphilis with the use of single RPR test.

  9. Linking Existing Instruments to Develop an Activity of Daily Living Item Bank.

    Science.gov (United States)

    Li, Chih-Ying; Romero, Sergio; Bonilha, Heather S; Simpson, Kit N; Simpson, Annie N; Hong, Ickpyo; Velozo, Craig A

    2018-03-01

    This study examined dimensionality and item-level psychometric properties of an item bank measuring activities of daily living (ADL) across inpatient rehabilitation facilities and community living centers. Common person equating method was used in the retrospective veterans data set. This study examined dimensionality, model fit, local independence, and monotonicity using factor analyses and fit statistics, principal component analysis (PCA), and differential item functioning (DIF) using Rasch analysis. Following the elimination of invalid data, 371 veterans who completed both the Functional Independence Measure (FIM) and minimum data set (MDS) within 6 days were retained. The FIM-MDS item bank demonstrated good internal consistency (Cronbach's α = .98) and met three rating scale diagnostic criteria and three of the four model fit statistics (comparative fit index/Tucker-Lewis index = 0.98, root mean square error of approximation = 0.14, and standardized root mean residual = 0.07). PCA of Rasch residuals showed the item bank explained 94.2% variance. The item bank covered the range of θ from -1.50 to 1.26 (item), -3.57 to 4.21 (person) with person strata of 6.3. The findings indicated the ADL physical function item bank constructed from FIM and MDS measured a single latent trait with overall acceptable item-level psychometric properties, suggesting that it is an appropriate source for developing efficient test forms such as short forms and computerized adaptive tests.

  10. Homogeneity tests for variances and mean test under heterogeneity conditions in a single way ANOVA method

    International Nuclear Information System (INIS)

    Morales P, J.R.; Avila P, P.

    1996-01-01

    If we have consider the maximum permissible levels showed for the case of oysters, it results forbidding to collect oysters at the four stations of the El Chijol Channel ( Veracruz, Mexico), as well as along the channel itself, because the metal concentrations studied exceed these limits. In this case the application of Welch tests were not necessary. For the water hyacinth the means of the treatments were unequal in Fe, Cu, Ni, and Zn. This case is more illustrative, for the conclusion has been reached through the application of the Welch tests to treatments with heterogeneous variances. (Author)

  11. The Heteroscedastic Graded Response Model with a Skewed Latent Trait: Testing Statistical and Substantive Hypotheses Related to Skewed Item Category Functions

    Science.gov (United States)

    Molenaar, Dylan; Dolan, Conor V.; de Boeck, Paul

    2012-01-01

    The Graded Response Model (GRM; Samejima, "Estimation of ability using a response pattern of graded scores," Psychometric Monograph No. 17, Richmond, VA: The Psychometric Society, 1969) can be derived by assuming a linear regression of a continuous variable, Z, on the trait, [theta], to underlie the ordinal item scores (Takane & de Leeuw in…

  12. An Item Response Theory–Based, Computerized Adaptive Testing Version of the MacArthur–Bates Communicative Development Inventory: Words & Sentences (CDI:WS)

    DEFF Research Database (Denmark)

    Makransky, Guido; Dale, Philip S.; Havmose, Philip

    2016-01-01

    precision. Method: Parent-reported vocabulary for the American CDI:WS norming sample consisting 1461 children between the ages of 16 and 30 months was used to investigate the fit of the items to the 2 parameter logistic (2-PL) IRT model, and to simulate CDI-CAT versions with 400, 200, 100, 50, 25, 10 and 5...

  13. Single Event Effects Test Facility Options at the Oak Ridge National Laboratory

    Energy Technology Data Exchange (ETDEWEB)

    Riemer, Bernie [ORNL; Gallmeier, Franz X [ORNL; Dominik, Laura J [ORNL

    2015-01-01

    Increasing use of microelectronics of ever diminishing feature size in avionics systems has led to a growing Single Event Effects (SEE) susceptibility arising from the highly ionizing interactions of cosmic rays and solar particles. Single event effects caused by atmospheric radiation have been recognized in recent years as a design issue for avionics equipment and systems. To ensure a system meets all its safety and reliability requirements, SEE induced upsets and potential system failures need to be considered, including testing of the components and systems in a neutron beam. Testing of integrated circuits (ICs) and systems for use in radiation environments requires the utilization of highly advanced laboratory facilities that can run evaluations on microcircuits for the effects of radiation. This paper provides a background of the atmospheric radiation phenomenon and the resulting single event effects, including single event upset (SEU) and latch up conditions. A study investigating requirements for future single event effect irradiation test facilities and developing options at the Spallation Neutron Source (SNS) is summarized. The relatively new SNS with its 1.0 GeV proton beam, typical operation of 5000 h per year, expertise in spallation neutron sources, user program infrastructure, and decades of useful life ahead is well suited for hosting a world-class SEE test facility in North America. Emphasis was put on testing of large avionics systems while still providing tunable high flux irradiation conditions for component tests. Makers of ground-based systems would also be served well by these facilities. Three options are described; the most capable, flexible, and highest-test-capacity option is a new stand-alone target station using about one kW of proton beam power on a gas-cooled tungsten target, with dual test enclosures. Less expensive options are also described.

  14. Ambiguity in measuring matrix diffusion with single-well injection/recovery tracer tests

    Science.gov (United States)

    Lessoff, S.C.; Konikow, Leonard F.

    1997-01-01

    Single-well injection/recovery tracer tests are considered for use in characterizing and quantifying matrix diffusion in dual-porosity aquifers. Numerical modeling indicates that neither regional drift in homogeneous aquifers, nor heterogeneity in aquifers having no regional drift, nor hydrodynamic dispersion significantly affects these tests. However, when drift is coupled simultaneously with heterogeneity, they can have significant confounding effects on tracer return. This synergistic effect of drift and heterogeneity may help explain irreversible flow and inconsistent results sometimes encountered in previous single-well injection/recovery tracer tests. Numerical results indicate that in a hypothetical single-well injection/recovery tracer test designed to demonstrate and measure dual-porosity characteristics in a fractured dolomite, the simultaneous effects of drift and heterogeneity sometimes yields responses similar to those anticipated in a homogeneous dual-porosity formation. In these cases, tracer recovery could provide a false indication of the occurrence of matrix diffusion. Shortening the shut-in period between injection and recovery periods may make the test less sensitive to drift. Using multiple tracers having different diffusion characteristics, multiple tests having different pumping schedules, and testing the formation at more than one location would decrease the ambiguity in the interpretation of test data.

  15. Comparative Performance of Four Single Extreme Outlier Discordancy Tests from Monte Carlo Simulations

    Directory of Open Access Journals (Sweden)

    Surendra P. Verma

    2014-01-01

    Full Text Available Using highly precise and accurate Monte Carlo simulations of 20,000,000 replications and 102 independent simulation experiments with extremely low simulation errors and total uncertainties, we evaluated the performance of four single outlier discordancy tests (Grubbs test N2, Dixon test N8, skewness test N14, and kurtosis test N15 for normal samples of sizes 5 to 20. Statistical contaminations of a single observation resulting from parameters called δ from ±0.1 up to ±20 for modeling the slippage of central tendency or ε from ±1.1 up to ±200 for slippage of dispersion, as well as no contamination (δ=0 and ε=±1, were simulated. Because of the use of precise and accurate random and normally distributed simulated data, very large replications, and a large number of independent experiments, this paper presents a novel approach for precise and accurate estimations of power functions of four popular discordancy tests and, therefore, should not be considered as a simple simulation exercise unrelated to probability and statistics. From both criteria of the Power of Test proposed by Hayes and Kinsella and the Test Performance Criterion of Barnett and Lewis, Dixon test N8 performs less well than the other three tests. The overall performance of these four tests could be summarized as N2≅N15>N14>N8.

  16. Microwave testing of high-Tc based direct current to a single flux quantum converter

    DEFF Research Database (Denmark)

    Kaplunenko, V. K.; Fischer, Gerd Michael; Ivanov, Z. G.

    1994-01-01

    Design, simulation, and experimental investigations of a direct current to a single flux quantum converter loaded with a Josephson transmission line and driven by an external 70 GHz microwave oscillator are reported. The test circuit includes nine YBaCuO Josephson junctions aligned on the grain...... boundary of a 0°–32° asymmetric Y-ZrO2 bicrystal substrate. The performance of such converters is important for the development of the fast Josephson samplers required for testing of high-Tc rapid single flux quantum circuits in high-speed digital superconducting electronics. Journal of Applied Physics...

  17. Testing of the large bore single aperture 1-meter superconducting dipoles made with phenolic inserts

    CERN Document Server

    Boschmann, H; Dubbeldam, R L; Kirby, G A; Lucas, J; Ostojic, R; Russenschuck, Stephan; Siemko, A; Taylor, T M; Vanenkov, I; Weterings, W

    1998-01-01

    Two identical single aperture 1-metre superconducting dipoles have been built in collaboration with HMA Power Systems and tested at CERN. The 87.8 mm aperture magnets feature a single layer coil wound using LHC main dipole outer layer cable, phenolic spacer type collars, and a keyed two part structural iron yoke. The magnets are designed as models of the D1 separation dipole in the LHC experimental insertions, whose nominal field is 4.5 T at 4.5 K. In this report we present the test results of the two magnets at 4.3 K and 1.9 K.

  18. TR-PIV Performance Test for a Flow Field Measurement in a Single Rod Test Section

    International Nuclear Information System (INIS)

    Park, Ju Yong; Shin, Chang Hwan; Lee, Chi Young; Oh, Dong Seok; In, Wang Kee

    2011-01-01

    For large enhancement of performance of Pressurized Water Reactor(PWR), dual-cooled fuel is being developed in Korea Atomic Energy Research Institute(KAERI). This nuclear fuel is a ring shape fuel which is different from conventional cylindrical nuclear fuel and cooling water flows both inner and outer channel. For this fuel, it widens the surface area. But it is bigger outer diameter of fuel rods. So, interval between fuel rods narrows. This because of outer channel flow is unstable. So, measurement of turbulence flow and perturbation that influence in heat transfer elevation is important.. To understand heat transfer characteristics by turbulence, measurement of flow perturbation element is necessary. To measure these turbulence characteristics, hot wire anemometer is widely used. However, it has many disadvantages such as low durability of prove, and big probe size. For these reasons, TR-PIV(Time-Resolved Particle Image Velocimetry) system is employed for better flow measurement in our research institute. TR-PIV system is consisted of laser system and high-speed camera that have high frequency. So, was judged that can measurement complicated turbulence flow and perturbation. In this paper, introduce TR-PIV system, and with results acquiring in single rod flow through this system, and wish to introduce about after this practical use plan

  19. Item Modeling Concept Based on Multimedia Authoring

    Directory of Open Access Journals (Sweden)

    Janez Stergar

    2008-09-01

    Full Text Available In this paper a modern item design framework for computer based assessment based on Flash authoring environment will be introduced. Question design will be discussed as well as the multimedia authoring environment used for item modeling emphasized. Item type templates are a structured means of collecting and storing item information that can be used to improve the efficiency and security of the innovative item design process. Templates can modernize the item design, enhance and speed up the development process. Along with content creation, multimedia has vast potential for use in innovative testing. The introduced item design template is based on taxonomy of innovative items which have great potential for expanding the content areas and construct coverage of an assessment. The presented item design approach is based on GUI's – one for question design based on implemented item design templates and one for user interaction tracking/retrieval. The concept of user interfaces based on Flash technology will be discussed as well as implementation of the innovative approach of the item design forms with multimedia authoring. Also an innovative method for user interaction storage/retrieval based on PHP extending Flash capabilities in the proposed framework will be introduced.

  20. Item Purification Does Not Always Improve DIF Detection: A Counterexample with Angoff's Delta Plot

    Science.gov (United States)

    Magis, David; Facon, Bruno

    2013-01-01

    Item purification is an iterative process that is often advocated as improving the identification of items affected by differential item functioning (DIF). With test-score-based DIF detection methods, item purification iteratively removes the items currently flagged as DIF from the test scores to get purified sets of items, unaffected by DIF. The…

  1. Methodological issues regarding power of classical test theory (CTT and item response theory (IRT-based approaches for the comparison of patient-reported outcomes in two groups of patients - a simulation study

    Directory of Open Access Journals (Sweden)

    Boyer François

    2010-03-01

    Full Text Available Abstract Background Patients-Reported Outcomes (PRO are increasingly used in clinical and epidemiological research. Two main types of analytical strategies can be found for these data: classical test theory (CTT based on the observed scores and models coming from Item Response Theory (IRT. However, whether IRT or CTT would be the most appropriate method to analyse PRO data remains unknown. The statistical properties of CTT and IRT, regarding power and corresponding effect sizes, were compared. Methods Two-group cross-sectional studies were simulated for the comparison of PRO data using IRT or CTT-based analysis. For IRT, different scenarios were investigated according to whether items or person parameters were assumed to be known, to a certain extent for item parameters, from good to poor precision, or unknown and therefore had to be estimated. The powers obtained with IRT or CTT were compared and parameters having the strongest impact on them were identified. Results When person parameters were assumed to be unknown and items parameters to be either known or not, the power achieved using IRT or CTT were similar and always lower than the expected power using the well-known sample size formula for normally distributed endpoints. The number of items had a substantial impact on power for both methods. Conclusion Without any missing data, IRT and CTT seem to provide comparable power. The classical sample size formula for CTT seems to be adequate under some conditions but is not appropriate for IRT. In IRT, it seems important to take account of the number of items to obtain an accurate formula.

  2. Site characterization and validation - equipment design and techniques used in single borehole hydraulic testing, simulated drift experiment and crosshole testing

    International Nuclear Information System (INIS)

    Holmes, D.C.; Sehlstedt, M.

    1991-10-01

    This report describes the equipment and techniques used to investigate the variation of hydrogeological parameters within a fractured crystalline rock mass. The testing program was performed during stage 3 of the site characterization and validation programme at the Stripa mine in Sweden. This programme used a multidisciplinary approach, combining geophysical, geological and hydrogeological methods, to determine how groundwater moved through the rock mass. The hydrogeological work package involved three components. Firstly, novel single borehole techniques (focused packer testing) were used to determine the distribution of hydraulic conductivity and head along individual boreholes. Secondly, water was abstracted from boreholes which were drilled to simulate a tunnel (simulated drift experiment). Locations and magnitudes of flows were measured together with pressure responses at various points in the SCV rock mass. Thirdly, small scale crosshole tests, involving detailed interference testing, were used to determine the variability of hydrogeological parameters within previously identified, significant flow zones. (au)

  3. In-flight and ground testing of single event upset sensitivity in static RAMs

    International Nuclear Information System (INIS)

    Johansson, K.; Dyreklev, P.; Granbom, B.; Calvet, C.; Fourtine, S.; Feuillatre, O.

    1998-01-01

    This paper presents the results from in-flight measurements of single event upsets (SEU) in static random access memories (SRAM) caused by the atmospheric radiation environment at aircraft altitudes. The memory devices were carried on commercial airlines at high altitude and mainly high latitudes. The SEUs were monitored by a Component Upset Test Equipment (CUTE), designed for this experiment. The in flight results are compared to ground based testing with neutrons from three different sources

  4. Testing and verification of a novel single-channel IGBT driver circuit

    OpenAIRE

    Lukić, Milan; Ninković, Predrag

    2016-01-01

    This paper presents a novel single-channel IGBT driver circuit together with a procedure for testing and verification. It is based on a specialized integrated circuit with complete range of protective functions. Experiments are performed to test and verify its behaviour. Experimental results are presented in the form of oscilloscope recordings. It is concluded that the new driver circuit is compatible with modern IGBT transistors and power converter demands and that it can be applied in new d...

  5. Hydrogeological study of single water conducting fracture using a crosshole hydraulic test apparatus

    International Nuclear Information System (INIS)

    Yamamoto, Hajime; Shimo, Michito; Yamamoto, Takuya

    1998-03-01

    The Crosshole Injection Test Apparatus has been constructed to evaluate the hydraulic properties and conditions, such as hydraulic conductivity and its anisotropy, storage coefficient, pore pressure etc. within a rock near a drift. The construction started in FY93 and completed on August FY96 as a set of equipments for the use of crosshole hydraulic test, which is composed of one injection borehole instrument, one observation borehole instrument and a set of on-ground instrument. In FY96, in-situ feasibility test was conducted at a 550 m level drift in Kamaishi In Situ Test Site which has been operated by PNC, and the performance of the equipment and its applicability to various types of injection method were confirmed. In this year, a hydrogeological investigation on the single water conducting fracture was conducted at a 250 m level drift in Kamaishi In Situ Test Site, using two boreholes, KCH-3 and KCH-4, both of which are 30 m depth and inclined by 45 degrees from the surface. Pressure responses at the KCH-3 borehole during the drilling of KCH-4 borehole, the results of Borehole TV logging and core observation indicated that a major conductive single-fracture was successfully isolated by the packers. As a result of a series of the single-hole and the crosshole tests (sinusoidal and constant flowrate test), the hydraulic parameters of the single-fracture (such as hydraulic conductivity and storage coefficient) were determined. This report shows all the test result, analysed data, and also describes the hydro-geological structure near the drift. (author)

  6. Psychometric characteristics of single-word tests of children's speech sound production.

    Science.gov (United States)

    Flipsen, Peter; Ogiela, Diane A

    2015-04-01

    Our understanding of test construction has improved since the now-classic review by McCauley and Swisher (1984). The current review article examines the psychometric characteristics of current single-word tests of speech sound production in an attempt to determine whether our tests have improved since then. It also provides a resource that clinicians may use to help them make test selection decisions for their particular client populations. Ten tests published since 1990 were reviewed to determine whether they met the 10 criteria set out by McCauley and Swisher (1984), as well as 7 additional criteria. All of the tests reviewed met at least 3 of McCauley and Swisher's (1984) original criteria, and 9 of 10 tests met at least 5 of them. Most of the tests met some of the additional criteria as well. The state of the art for single-word tests of speech sound production in children appears to have improved in the last 30 years. There remains, however, room for improvement.

  7. Reliability and validity of the Spanish version of the 10-item Connor-Davidson Resilience Scale (10-item CD-RISC in young adults

    Directory of Open Access Journals (Sweden)

    García-Campayo Javier

    2011-08-01

    Full Text Available Abstract Background The 10-item Connor-Davidson Resilience Scale (10-item CD-RISC is an instrument for measuring resilience that has shown good psychometric properties in its original version in English. The aim of this study was to evaluate the validity and reliability of the Spanish version of the 10-item CD-RISC in young adults and to verify whether it is structured in a single dimension as in the original English version. Findings Cross-sectional observational study including 681 university students ranging in age from 18 to 30 years. The number of latent factors in the 10 items of the scale was analyzed by exploratory factor analysis. Confirmatory factor analysis was used to verify whether a single factor underlies the 10 items of the scale as in the original version in English. The convergent validity was analyzed by testing whether the mean of the scores of the mental component of SF-12 (MCS and the quality of sleep as measured with the Pittsburgh Sleep Index (PSQI were higher in subjects with better levels of resilience. The internal consistency of the 10-item CD-RISC was estimated using the Cronbach α test and test-retest reliability was estimated with the intraclass correlation coefficient. The Cronbach α coefficient was 0.85 and the test-retest intraclass correlation coefficient was 0.71. The mean MCS score and the level of quality of sleep in both men and women were significantly worse in subjects with lower resilience scores. Conclusions The Spanish version of the 10-item CD-RISC showed good psychometric properties in young adults and thus can be used as a reliable and valid instrument for measuring resilience. Our study confirmed that a single factor underlies the resilience construct, as was the case of the original scale in English.

  8. A single well pumping and recovery test to measure in situ acrotelm transmissivity in raised bogs

    NARCIS (Netherlands)

    Schaaf, van der S.

    2004-01-01

    A quasi-steady-state single pit pumping and recovery test to measure in situ the transmissivity of the highly permeable upper layer of raised bogs, the acrotelm, is described and discussed. The basic concept is the expanding depression cone during both pumping and recovery. It is shown that applying

  9. Should the diagnosis of COPD be based on a single spirometry test?

    NARCIS (Netherlands)

    Schermer, T.R.; Robberts, B.; Crockett, A.J.; Thoonen, B.P.; Lucas, A.; Grootens, J.; Smeele, I.J.; Thamrin, C.; Reddel, H.K.

    2016-01-01

    Clinical guidelines indicate that a chronic obstructive pulmonary disease (COPD) diagnosis is made from a single spirometry test. However, long-term stability of diagnosis based on forced expiratory volume in 1 s over forced vital capacity (FEV1/FVC) ratio has not been reported. In primary care

  10. An improved single sensor parity space algorithm for sequential probability ratio test

    Energy Technology Data Exchange (ETDEWEB)

    Racz, A. [Hungarian Academy of Sciences, Budapest (Hungary). Atomic Energy Research Inst.

    1995-12-01

    In our paper we propose a modification of the single sensor parity algorithm in order to make the statistical properties of the generated residual determinable in advance. The algorithm is tested via computer simulated ramp failure at the temperature readings of the pressurizer. (author).

  11. High-energy heavy ion testing of VLSI devices for single event ...

    Indian Academy of Sciences (India)

    Unknown

    per describes the high-energy heavy ion radiation testing of VLSI devices for single event upset (SEU) ... The experimental set up employed to produce low flux of heavy ions viz. silicon ... through which they pass, leaving behind a wake of elec- ... for use in Bus Management Unit (BMU) and bulk CMOS ... was scheduled.

  12. Single-Cylinder Diesel Engine Tests with Unstabilized Water-in-Fuel Emulsions

    Science.gov (United States)

    1978-08-01

    A single-cylinder, four-stroke cycle diesel engine was operated on unstabilized water-in-fuel emulsions. Two prototype devices were used to produce the emulsions on-line with the engine. More than 350 test points were run with baseline diesel fuel an...

  13. Utilizing Response Time Distributions for Item Selection in CAT

    Science.gov (United States)

    Fan, Zhewen; Wang, Chun; Chang, Hua-Hua; Douglas, Jeffrey

    2012-01-01

    Traditional methods for item selection in computerized adaptive testing only focus on item information without taking into consideration the time required to answer an item. As a result, some examinees may receive a set of items that take a very long time to finish, and information is not accrued as efficiently as possible. The authors propose two…

  14. Propriedades psicométricas dos itens do teste WISC-III Propiedades psicométricas de los ítenes del subtest WISC-III Psychometric properties of WISC-III items

    Directory of Open Access Journals (Sweden)

    Vera Lúcia Marques de Figueiredo

    2008-09-01

    Full Text Available O aperfeiçoamento de um teste se dá através da seleção, substituição ou revisão de itens, e quando um item é analisado, aumenta a validade e precisão do teste. Este artigo trata da apresentação dos resultados relativos às propriedades psicométricas dos itens dos subtestes do WISC-III, referentes a dificuldade, discriminação e validade. O WISC-III é um instrumento amplamente utilizado no contexto da avaliação da inteligência, e conhecer a qualidade dos itens é essencial ao profissional que administra o teste. As análises foram efetuadas com base nas pontuações de 801 protocolos do teste, aplicados por ocasião da pesquisa de adaptação a um contexto brasileiro. As análises mostraram que os itens adaptados apresentaram características psicométricas adequadas, possibilitando a utilização do instrumento como meio confiável de diagnóstico.El perfeccionamiento de un teste ocurre por la selección, sustitución o revisión de ítenes y, cuando un item es analisado, aumenta la validez y fiabilidad del teste. Ese artículo trata de la presentación de los resultados relativos a las propiedades psicométricas de los ítenes del subtest WISC-III, referentes a la dificultad, a la discriminación y a la validez. El WISC-III es un instrumento muy utilizado en el contexto de la evaluación de la inteligencia, y conocer a la calidad de los ítenes es esencial al profesional que administra el teste. Los análisis fueron efectuados con base el los puntajes de 801 protocolos de registro del teste, aplicados por ocasión de encuesta de estandarización a un contexto brasileño. Los análisis enseñaron que los ítenes adaptados apuntaron características psicométricas adecuadas, permitiendo la utilización del instrumento como medio confiable de diagnóstico.The improvement of the quality of items by selection, substitution and review will increase a test's validity and reliability. Current essay will present results referring to

  15. A high frequency test bench for rapid single-flux-quantum circuits

    International Nuclear Information System (INIS)

    Engseth, H; Intiso, S; Rafique, M R; Tolkacheva, E; Kidiyarova-Shevchenko, A

    2006-01-01

    We have designed and experimentally verified a test bench for high frequency testing of rapid single-flux-quantum (RSFQ) circuits. This test bench uses an external tunable clock signal that is stable in amplitude, phase and frequency. The high frequency external clock reads out the clock pattern stored in a long shift register. The clock pattern is consequently shifted out at high speed and split to feed both the circuit under test and an additional shift register in the test bench for later verification at low speed. This method can be employed for reliable high speed verification of RSFQ circuit operation, with use of only low speed read-out electronics. The test bench consists of 158 Josephson junctions and the occupied area is 3300 x 660 μm 2 . It was experimentally verified up to 33 GHz with ± 21.7% margins on the global bias supply current

  16. Computer-aided, single-specimen controlled bending test for fracture-kinetics measurement in ceramics

    International Nuclear Information System (INIS)

    Borovik, V.G.; Chushko, V.M.; Kovalev, S.P.

    1995-01-01

    Fracture testing of ceramics by using controlled crack growth is proposed to allow study of crack-kinetics behavior under a given loading history. A computer-aided, real-time data acquisition system improves the quality of crack-growth parameters obtained in a simple, single-specimen bend test. Several ceramic materials were tested in the present study: aluminum nitride as a linear-elastic material; and alumina and yttria-stabilized zirconia, both representative of ceramics with microstructure-dependent nonlinear fracture properties. Ambiguities in the crack-growth diagrams are discussed to show the importance of accounting for crack-growth history in correctly describing nonequilibrium fracture behavior

  17. Determination of chlorinated hydrocarbons in single and multi component test gases

    Energy Technology Data Exchange (ETDEWEB)

    Giese, U.; Stenner, H. (Paderborn Univ. (Gesamthochschule) (Germany, F.R.). Angewandte Chemie); Ludwig, E.; Kettrup, A. (Paderborn Univ. (Gesamthochschule) (Germany, F.R.). Angewandte Chemie Gesellschaft fuer Strahlen- und Umweltforschung mbH Muenchen, Neuherberg (Germany, F.R.). Inst. fuer Oekologische Chemie)

    1990-11-01

    For comparing the efficiency of active and diffusive sampling methods two diffusive samplers with different properties were used to determine chlorinated hydrocarbons (CH{sub 2}Cl{sub 2}, CHCl{sub 3}, CCl{sub 4}) in single and multi component test gas mixtures. One of the chosen diffusive samplers can also be used for active sampling. In general, good correlations of all tested methods could be observed in the direct comparison of active and diffusive sampling and in the determination of the efficiencies. During the application of active and diffusive sampling methods in multi component test gases of the analytes possible interferences could not be ascertained. (orig.).

  18. Tank selection for Light Duty Utility Arm (LDUA) system hot testing in a single shell tank

    Energy Technology Data Exchange (ETDEWEB)

    Bhatia, P.K.

    1995-01-31

    The purpose of this report is to recommend a single shell tank in which to hot test the Light Duty Utility Arm (LDUA) for the Tank Waste Remediation System (TWRS) in Fiscal Year 1996. The LDUA is designed to utilize a 12 inch riser. During hot testing, the LDUA will deploy two end effectors (a High Resolution Stereoscopic Video Camera System and a Still/Stereo Photography System mounted on the end of the arm`s tool interface plate). In addition, three other systems (an Overview Video System, an Overview Stereo Video System, and a Topographic Mapping System) will be independently deployed and tested through 4 inch risers.

  19. Discussions On Worst-Case Test Condition For Single Event Burnout

    Science.gov (United States)

    Liu, Sandra; Zafrani, Max; Sherman, Phillip

    2011-10-01

    This paper discusses the failure characteristics of single- event burnout (SEB) on power MOSFETs based on analyzing the quasi-stationary avalanche simulation curves. The analyses show the worst-case test condition for SEB would be using the ion that has the highest mass that would result in the highest transient current due to charge deposition and displacement damage. The analyses also show it is possible to build power MOSFETs that will not exhibit SEB even when tested with the heaviest ion, which have been verified by heavy ion test data on SEB sensitive and SEB immune devices.

  20. Tank selection for Light Duty Utility Arm (LDUA) system hot testing in a single shell tank

    International Nuclear Information System (INIS)

    Bhatia, P.K.

    1995-01-01

    The purpose of this report is to recommend a single shell tank in which to hot test the Light Duty Utility Arm (LDUA) for the Tank Waste Remediation System (TWRS) in Fiscal Year 1996. The LDUA is designed to utilize a 12 inch riser. During hot testing, the LDUA will deploy two end effectors (a High Resolution Stereoscopic Video Camera System and a Still/Stereo Photography System mounted on the end of the arm's tool interface plate). In addition, three other systems (an Overview Video System, an Overview Stereo Video System, and a Topographic Mapping System) will be independently deployed and tested through 4 inch risers

  1. Mechanical Properties of Commercial Carbon Fibers Using a Single Filament Tensile Test

    International Nuclear Information System (INIS)

    Joh, Han-Ik; Song, Hae Kyung; Ku, Bon-Cheol; Lee, Sungho; Kim, Ki-Young; Kang, Phil-Hyun

    2013-01-01

    In this study, mechanical properties of commercial carbon fibers were evaluated using a single filament tensile test with various fiber gauge lengths. Tensile strength increased significantly with a decreasing length of the test specimens possibly due to small defect sites. The compliance method provided more accurate moduli of the carbon fibers, removing system errors during the single filament tensile test. The Weibull modulus revealed that shorter specimens had an inhomogeneous defect distribution, leading to a higher tensile strength and its standard deviation. X-ray diffractograms of carbon fibers showed a similar crystallinity and orientation in spite of significant differences in the fiber modulus and strength, indicating that crystalline structure of the commercial carbon fibers used in the study was not attributable to the difference in their tensile properties.

  2. MRPC prototypes for NeuLAND tested using the single electron mode of ELBE/Dresden

    Energy Technology Data Exchange (ETDEWEB)

    Yakorev, Dmitry; Bemmerer, Daniel; Elekes, Zoltan; Kempe, Mathias; Stach, Daniel; Wagner, Andreas [Forschungszentrum Dresden-Rossendorf (FZD), Dresden (Germany); Aumann, Tom; Boretzky, Konstanze; Caesar, Christoph; Ciobanu, Mircea; Hehner, Joerg; Heil, Michael; Nusair, Omar; Reifarth, Rene; Simon, Haik [GSI, Darmstadt (Germany); Elvers, Michael; Maroussov, Vassili; Zilges, Andreas [Universitaet Koeln (Germany); Zuber, Kai [TU Dresden (Germany)

    2010-07-01

    The NeuLAND detector at the R{sup 3}B experiment at the future FAIR facility in Darmstadt aims to detect fast neutrons (0.2-1.0 GeV) with high time and spatial resolutions ({sigma}{sub t}<100 ps, {sigma}{sub x,y,z}<1 cm). Prototypes for the NeuLAND detector have been built at FZD and GSI and then studied using the 32 MeV pulsed electron beam at the superconducting electron accelerator ELBE in Dresden, Germany. Owing to the new, single-electron per bunch mode of operation, a rapid validation of the design criteria ({>=}90% efficiency for minimum ionizing particles, {sigma} {<=} 100 ps time resolution) was possible. Tested properties of the prototypes include glass thickness, spacing of the central anode, and a comparison of single-ended and differential readout. Tested frontend electronics schemes include FOPI (single-ended), PADI-based (both single-ended and differential mode tested), and ALICE (differential).

  3. Experimental results of single screw mechanical tests: a follow-up to SAND2005-6036.

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Sandwook; Lee, Kenneth L.; Korellis, John S.; McFadden, Sam X.

    2006-08-01

    The work reported here was conducted to address issues raised regarding mechanical testing of attachment screws described in SAND2005-6036, as well as to increase the understanding of screw behavior through additional testing. Efforts were made to evaluate fixture modifications and address issues of interest, including: fabrication of 45{sup o} test fixtures, measurement of the frictional load from the angled fixture guide, employment of electromechanical displacement transducers, development of a single-shear test, and study the affect of thread start orientation on single-shear behavior. A286 and 302HQ, No.10-32 socket-head cap screws were tested having orientations with respect to the primary loading axis of 0{sup 0}, 45{sup o}, 60{sup o}, 75{sup o} and 90{sup o} at stroke speeds 0,001 and 10 in/sec. The frictional load resulting from the angled screw fixture guide was insignificant. Load-displacement curves of A286 screws did not show a minimum value in displacement to failure (DTF) for 60{sup o} shear tests. Tests of 302HQ screws did not produce a consistent trend in DTF with load angle. The effect of displacement rate on DTF became larger as shear angle increased for both A286 and 302HQ screws.

  4. A 67-Item Stress Resilience item bank showing high content validity was developed in a psychosomatic sample.

    Science.gov (United States)

    Obbarius, Nina; Fischer, Felix; Obbarius, Alexander; Nolte, Sandra; Liegl, Gregor; Rose, Matthias

    2018-04-10

    To develop the first item bank to measure Stress Resilience (SR) in clinical populations. Qualitative item development resulted in an initial pool of 131 items covering a broad theoretical SR concept. These items were tested in n=521 patients at a psychosomatic outpatient clinic. Exploratory and Confirmatory Factor Analysis (CFA), as well as other state-of-the-art item analyses and IRT were used for item evaluation and calibration of the final item bank. Out of the initial item pool of 131 items, we excluded 64 items (54 factor loading .3, 2 non-discriminative Item Response Curves, 4 Differential Item Functioning). The final set of 67 items indicated sufficient model fit in CFA and IRT analyses. Additionally, a 10-item short form with high measurement precision (SE≤.32 in a theta range between -1.8 and +1.5) was derived. Both the SR item bank and the SR short form were highly correlated with an existing static legacy tool (Connor-Davidson Resilience Scale). The final SR item bank and 10-item short form showed good psychometric properties. When further validated, they will be ready to be used within a framework of Computer-Adaptive Tests for a comprehensive assessment of the Stress-Construct. Copyright © 2018. Published by Elsevier Inc.

  5. WORKING MEMORY CAPACITY TEST REVEALS SUBJECTS DIFFICULTIES MANAGING LIMITED CAPACITY

    Directory of Open Access Journals (Sweden)

    R V Ershova

    2016-12-01

    Full Text Available Free recall consists of two separate stages: the emptying of working memory and reactivation [5]. The Tarnow Unchunkable Test (TUT, [7] uses double integer items to separate out only the first stage by making it difficult to reactivate items due to the lack of intra-item relationships.193 Russian college students were tested via the internet version of the TUT. The average number of items remembered in the 3 item test was 2.54 items. In the 4 item test, the average number of items decreased to 2.38. This, and a number of other qualitative distribution differences between the 3 and 4 item tests, indicate that the average capacity limit of working memory has been reached at 3 items. This provides the first direct measurement of the unchunkable capacity limit of number items.Difficulties in managing working memory occurred as most subjects remembered less as the number of items increased beyond capacity and failed to remember a single item in at least one out of three 4 item trials. The Pearson correlation between the total recall of 3 and 4 items was a small 38%.

  6. Single-task and dual-task tandem gait test performance after concussion.

    Science.gov (United States)

    Howell, David R; Osternig, Louis R; Chou, Li-Shan

    2017-07-01

    To compare single-task and dual-task tandem gait test performance between athletes after concussion with controls on observer-timed, spatio-temporal, and center-of-mass (COM) balance control measurements. Ten participants (19.0±5.5years) were prospectively identified and completed a tandem gait test protocol within 72h of concussion and again 1 week, 2 weeks, 1 month, and 2 months post-injury. Seven uninjured controls (20.0±4.5years) completed the same protocol in similar time increments. Tandem gait test trials were performed with (dual-task) and without (single-task) concurrently performing a cognitive test as whole-body motion analysis was performed. Outcome variables included test completion time, average tandem gait velocity, cadence, and whole-body COM frontal plane displacement. Concussion participants took significantly longer to complete the dual-task tandem gait test than controls throughout the first 2 weeks post-injury (mean time=16.4 [95% CI: 13.4-19.4] vs. 10.1 [95% CI: 6.4-13.7] seconds; p=0.03). Single-task tandem gait times were significantly lower 72h post-injury (p=0.04). Dual-task cadence was significantly lower for concussion participants than controls (89.5 [95% CI: 68.6-110.4] vs. 127.0 [95% CI: 97.4-156.6] steps/minute; p=0.04). Moderately-high to high correlations between tandem gait test time and whole-body COM medial-lateral displacement were detected at each time point during dual-task gait (r s =0.70-0.93; p=0.03-0.001). Adding a cognitive task during the tandem gait test resulted in longer detectable deficits post-concussion compared to the traditional single-task tandem gait test. As a clinical tool to assess dynamic motor function, tandem gait may assist with return to sport decisions after concussion. Copyright © 2017 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.

  7. Survey report for fiscal 1999. Evaluation and analysis on items applied for in high-performance industrial furnace introduction field test project in fiscal 1999; 1999 nendo koseino kogyoro donyu field test jigyo chosa hokokusho. Obo anken hyoka bunseki

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2000-03-01

    This paper describes comprehensive evaluation and analysis mainly on the 51 items adopted in the high-performance industrial furnace introduction field tests. The development of a high-performance industrial furnace has completed the fundamental development research in fiscal 1998, and the possibility was verified on energy saving of more than 30% and reduction of NOx emission by 50% over that by conventional furnaces. Upon this fundamental achievement, the field test project has started as the comprehensive approach to developing the practically usable technologies for three years from fiscal 1998 until fiscal 2000, which is being promoted as a joint research project. According to the survey on the actual state in fiscal 1998, a little less than 20,000 industrial furnaces having combustion capacity of more than 500,000 Kcal/hr (excepting boilers) are being used. If these furnaces are converted into the high-performance industrial furnace, energy conservation of 210,000 x 10{sup 6} Mcal (converted to crude oil of 22.7 x 10{sup 6} kl/year) can be achieved from the maximum annual energy consumption of 700,000 x 10{sup 6} Mcal (converted to crude oil of 75.7 x 10{sup 6} kl/year). This conservation amount corresponds to about 12% of the final energy consumption in the whole Japanese industrial departments in fiscal 1996. It is expected that the performance of the full-size high-performance industrial furnace will be verified, and this technology will be promoted for wide proliferation. (NEDO)

  8. Single well surfactant test to evaluate surfactant floods using multi tracer method

    Science.gov (United States)

    Sheely, Clyde Q.

    1979-01-01

    Data useful for evaluating the effectiveness of or designing an enhanced recovery process said process involving mobilizing and moving hydrocarbons through a hydrocarbon bearing subterranean formation from an injection well to a production well by injecting a mobilizing fluid into the injection well, comprising (a) determining hydrocarbon saturation in a volume in the formation near a well bore penetrating formation, (b) injecting sufficient mobilizing fluid to mobilize and move hydrocarbons from a volume in the formation near the well bore, and (c) determining the hydrocarbon saturation in a volume including at least a part of the volume of (b) by an improved single well surfactant method comprising injecting 2 or more slugs of water containing the primary tracer separated by water slugs containing no primary tracer. Alternatively, the plurality of ester tracers can be injected in a single slug said tracers penetrating varying distances into the formation wherein the esters have different partition coefficients and essentially equal reaction times. The single well tracer method employed is disclosed in U.S. Pat. No. 3,623,842. This method designated the single well surfactant test (SWST) is useful for evaluating the effect of surfactant floods, polymer floods, carbon dioxide floods, micellar floods, caustic floods and the like in subterranean formations in much less time and at much reduced cost compared to conventional multiwell pilot tests.

  9. Testing of a single-polarity piezoresistive three-dimensional stress-sensing chip

    International Nuclear Information System (INIS)

    Gharib, H H; Moussa, W A

    2013-01-01

    A new piezoresistive stress-sensing rosette is developed to extract the components of the three-dimensional (3D) stress tensor using single-polarity (n-type) piezoresistors. This paper presents the testing of a micro-fabricated sensing chip utilizing the developed single-polarity rosette. The testing is conducted using a four-point bending of a chip-on-beam to induce five controlled stress components, which are analyzed both numerically and experimentally. Numerical analysis using finite element analysis is conducted to study the levels of the induced stress components at three rosette-sites and the levels of the stress field non-uniformities, and to simulate the extracted stress components from the sensing rosette. The experimental analysis applied tensile and compressive loads over three rosette-sites at different load increments. The experimentally extracted stress components show good linearity with the applied load and values close to the numerical model. (paper)

  10. Probabilistic Modeling of Updating Epistemic Uncertainty In Pile Capacity Prediction With a Single Failure Test Result

    Directory of Open Access Journals (Sweden)

    Indra Djati Sidi

    2017-12-01

    Full Text Available The model error N has been introduced to denote the discrepancy between measured and predicted capacity of pile foundation. This model error is recognized as epistemic uncertainty in pile capacity prediction. The statistics of N have been evaluated based on data gathered from various sites and may be considered only as a eneral-error trend in capacity prediction, providing crude estimates of the model error in the absence of more specific data from the site. The results of even a single load test to failure, should provide direct evidence of the pile capacity at a given site. Bayes theorem has been used as a rational basis for combining new data with previous data to revise assessment of uncertainty and reliability. This study is devoted to the development of procedures for updating model error (N, and subsequently the predicted pile capacity with a results of single failure test.

  11. Measurement of single-top cross section and test of anomalous $Wtb$ coupling

    Energy Technology Data Exchange (ETDEWEB)

    Jung, Ji-Eun [Seoul National Univ. (Korea, Republic of)

    2010-01-01

    The top quark is most often produced in tt pairs via the strong interaction, however electroweak production of a singly-produced top quark is also possible. Electroweak single-top production is more difficult to observe than tt production. Studying single-top production is important for the following reasons. It provides direct measurement of the CKM matrix element and also single-top events are a background to several searches for SM or non-SM signals, such as Higgs boson searches. The information of spin polarization of top-quark can be used to t est anomalous W-t-b coupling. This thesis describes the result of a measurement of single-top cross-section and a test of anomalous W-t-b coupling using 4.8 f b-1 of data collected by the CDF Run II experiment at the Fermilab Tevatron. The measured cross-section is 1.83$+0.7\\atop{-0.6}$ pb and measured limit of |Vtb| is 0.41 at 95% CL. The fraction of V+A coupling is 0 ± 28 (%).

  12. Automated Item Generation with Recurrent Neural Networks.

    Science.gov (United States)

    von Davier, Matthias

    2018-03-12

    Utilizing technology for automated item generation is not a new idea. However, test items used in commercial testing programs or in research are still predominantly written by humans, in most cases by content experts or professional item writers. Human experts are a limited resource and testing agencies incur high costs in the process of continuous renewal of item banks to sustain testing programs. Using algorithms instead holds the promise of providing unlimited resources for this crucial part of assessment development. The approach presented here deviates in several ways from previous attempts to solve this problem. In the past, automatic item generation relied either on generating clones of narrowly defined item types such as those found in language free intelligence tests (e.g., Raven's progressive matrices) or on an extensive analysis of task components and derivation of schemata to produce items with pre-specified variability that are hoped to have predictable levels of difficulty. It is somewhat unlikely that researchers utilizing these previous approaches would look at the proposed approach with favor; however, recent applications of machine learning show success in solving tasks that seemed impossible for machines not too long ago. The proposed approach uses deep learning to implement probabilistic language models, not unlike what Google brain and Amazon Alexa use for language processing and generation.

  13. A New Kind of Single-Well Tracer Test for Assessing Subsurface Heterogeneity

    Science.gov (United States)

    Hansen, S. K.; Vesselinov, V. V.; Lu, Z.; Reimus, P. W.; Katzman, D.

    2017-12-01

    Single-well injection-withdrawal (SWIW) tracer tests have historically been interpreted using the idealized assumption of tracer path reversibility (i.e., negligible background flow), with background flow due to natural hydraulic gradient being an un-modeled confounding factor. However, we have recently discovered that it is possible to use background flow to our advantage to extract additional information about the subsurface. To wit: we have developed a new kind of single-well tracer test that exploits flow due to natural gradient to estimate the variance of the log hydraulic conductivity field of a heterogeneous aquifer. The test methodology involves injection under forced gradient and withdrawal under natural gradient, and makes use of a relationship, discovered using a large-scale Monte Carlo study and machine learning techniques, between power law breakthrough curve tail exponent and log-hydraulic conductivity variance. We will discuss how we performed the computational study and derived this relationship and then show an application example in which our new single-well tracer test interpretation scheme was applied to estimation of heterogeneity of a formation at the chromium contamination site at Los Alamos National Laboratory. Detailed core hole records exist at the same site, from which it was possible to estimate the log hydraulic conductivity variance using a Kozeny-Carman relation. The variances estimated using our new tracer test methodology and estimated by direct inspection of core were nearly identical, corroborating the new methodology. Assessment of aquifer heterogeneity is of critical importance to deployment of amendments associated with in-situ remediation strategies, since permeability contrasts potentially reduce the interaction between amendment and contaminant. Our new tracer test provides an easy way to obtain this information.

  14. Test Results Of A Single Aperture Dipole Model Magnet For LHC

    CERN Document Server

    Shintomi, T; Higashi, N; Kimura, N; Ogitsu, T; Tanaka, K; Terashima, A; Tsuchiya, K; Yamamoto, A; Orikasa, A; Makishima, K; Siegel, N; Leroy, D; Perin, R

    1999-01-01

    The 56 mm single aperture superconducting dipole model with a 5-block coil configuration was reassembled and tested to investigate the full support of electromagnetic forces using a high-manganese steel collar structure without $9 mechanical contribution from an iron yoke. The reassembled model, which has a gap between the high manganese steel collar and the horizontally split iron yoke, reached a central field of 9 tesla (93554330f short sample) at the first

  15. To Show or Not to Show: The Effects of Item Stems and Answer Options on Performance on a Multiple-Choice Listening Comprehension Test

    Science.gov (United States)

    Yanagawa, Kozo; Green, Anthony

    2008-01-01

    The purpose of this study is to examine whether the choice between three multiple-choice listening comprehension test formats results in any difference in listening comprehension test performance. The three formats entail (a) allowing test takers to preview both the question stem and answer options prior to listening; (b) allowing test takers to…

  16. Testing and verification of a novel single-channel IGBT driver circuit

    Directory of Open Access Journals (Sweden)

    Lukić Milan

    2016-01-01

    Full Text Available This paper presents a novel single-channel IGBT driver circuit together with a procedure for testing and verification. It is based on a specialized integrated circuit with complete range of protective functions. Experiments are performed to test and verify its behaviour. Experimental results are presented in the form of oscilloscope recordings. It is concluded that the new driver circuit is compatible with modern IGBT transistors and power converter demands and that it can be applied in new designs. It is a part of new 20kW industrial-grade boost converter.

  17. Testing of plain and fibrous concrete single cavity prestressed concrete reactor vessel models

    International Nuclear Information System (INIS)

    Oland, C.B.

    1985-01-01

    Two single-cavity prestressed concrete reactor vessel (PCRV) models were fabricated and tested to failure to demonstrate the structural response and ultimate pressure capacity of models cast from high-strength concretes. Concretes with design compressive strengths in excess of 70 MPa (10,000 psi) were developed for this investigation. One model was cast from plain concrete and failed in shear at the head region. The second model was cast from fiber reinforced concrete and failed by rupturing the circumferential prestressing at the sidewall of the structure. The tests also demonstrated the capabilities of the liner system to maintain a leak-tight pressure boundary. 3 refs., 4 figs

  18. Effectiveness Analysis of a Non-Destructive Single Event Burnout Test Methodology

    CERN Document Server

    Oser, P; Spiezia, G; Fadakis, E; Foucard, G; Peronnard, P; Masi, A; Gaillard, R

    2014-01-01

    It is essential to characterize power MosFETs regarding their tolerance to destructive Single Event Burnouts (SEB). Therefore, several non-destructive test methods have been developed to evaluate the SEB cross-section of power devices. A power MosFET has been evaluated using a test circuit, designed according to standard non-destructive test methods discussed in the literature. Guidelines suggest a prior adaptation of auxiliary components to the device sensitivity before the radiation test. With the first value chosen for the de-coupling capacitor, the external component initiated destructive events and affected the evaluation of the cross-section. As a result, the influence of auxiliary components on the device cross-section was studied. This paper presents the obtained experimental results, supported by SPICE simulations, to evaluate and discuss how the circuit effectiveness depends on the external components.

  19. Teoria da Resposta ao Item Teoria de la respuesta al item Item response theory

    Directory of Open Access Journals (Sweden)

    Eutalia Aparecida Candido de Araujo

    2009-12-01

    Full Text Available A preocupação com medidas de traços psicológicos é antiga, sendo que muitos estudos e propostas de métodos foram desenvolvidos no sentido de alcançar este objetivo. Entre os trabalhos propostos, destaca-se a Teoria da Resposta ao Item (TRI que, a princípio, veio completar limitações da Teoria Clássica de Medidas, empregada em larga escala até hoje na medida de traços psicológicos. O ponto principal da TRI é que ela leva em consideração o item particularmente, sem relevar os escores totais; portanto, as conclusões não dependem apenas do teste ou questionário, mas de cada item que o compõe. Este artigo propõe-se a apresentar esta Teoria que revolucionou a teoria de medidas.La preocupación con las medidas de los rasgos psicológicos es antigua y muchos estudios y propuestas de métodos fueron desarrollados para lograr este objetivo. Entre estas propuestas de trabajo se incluye la Teoría de la Respuesta al Ítem (TRI que, en principio, vino a completar las limitaciones de la Teoría Clásica de los Tests, ampliamente utilizada hasta hoy en la medida de los rasgos psicológicos. El punto principal de la TRI es que se tiene en cuenta el punto concreto, sin relevar las puntuaciones totales; por lo tanto, los resultados no sólo dependen de la prueba o cuestionario, sino que de cada ítem que lo compone. En este artículo se propone presentar la Teoría que revolucionó la teoría de medidas.The concern with measures of psychological traits is old and many studies and proposals of methods were developed to achieve this goal. Among these proposed methods highlights the Item Response Theory (IRT that, in principle, came to complete limitations of the Classical Test Theory, which is widely used until nowadays in the measurement of psychological traits. The main point of IRT is that it takes into account the item in particular, not relieving the total scores; therefore, the findings do not only depend on the test or questionnaire

  20. Negative affect impairs associative memory but not item memory.

    OpenAIRE

    Bisby, J. A.; Burgess, N.

    2014-01-01

    The formation of associations between items and their context has been proposed to rely on mechanisms distinct from those supporting memory for a single item. Although emotional experiences can profoundly affect memory, our understanding of how it interacts with different aspects of memory remains unclear. We performed three experiments to examine the effects of emotion on memory for items and their associations. By presenting neutral and negative items with background contexts, Experiment 1 ...

  1. Tailored Cloze: Improved with Classical Item Analysis Techniques.

    Science.gov (United States)

    Brown, James Dean

    1988-01-01

    The reliability and validity of a cloze procedure used as an English-as-a-second-language (ESL) test in China were improved by applying traditional item analysis and selection techniques. The 'best' test items were chosen on the basis of item facility and discrimination indices, and were administered as a 'tailored cloze.' 29 references listed.…

  2. A scale purification procedure for evaluation of differential item functioning

    NARCIS (Netherlands)

    Khalid, Muhammad Naveed; Glas, Cornelis A.W.

    2014-01-01

    Item bias or differential item functioning (DIF) has an important impact on the fairness of psychological and educational testing. In this paper, DIF is seen as a lack of fit to an item response (IRT) model. Inferences about the presence and importance of DIF require a process of so-called test

  3. Application of Phased Array Ultrasonic Testing (PAUT) on Single V-Butt Weld Integrity Determination

    International Nuclear Information System (INIS)

    Amry Amin Abas; Mohd Kamal Shah Shamsudin; Norhazleena Azaman

    2015-01-01

    Phased Array Ultrasonic Testing (PAUT) utilizes arrays of piezoelectric elements that are embedded in an epoxy base. The benefit of having such kind of array is that beam forming such as steering and focusing the beam front possible. This enables scanning patterns such as linear scan, sectorial scan and depth focusing scan to be performed. Ultrasonic phased array systems can potentially be employed in almost any test where conventional ultrasonic flaw detectors have traditionally been used. Weld inspection and crack detection are the most important applications, and these tests are done across a wide range of industries including aerospace, power generation, petrochemical, metal billet and tubular goods suppliers, pipeline construction and maintenance, structural metals, and general manufacturing. Phased arrays can also be effectively used to profile remaining wall thickness in corrosion survey applications. The benefits of PAUT are simplifying inspection of components of complex geometry, inspection of components with limited access, testing of welds with multiple angles from a single probe and increasing the probability of detection while improving signal-to-noise ratio. This paper compares the result of inspection on several specimens using PAUT as to digital radiography. The specimens are welded plates with single V-butt weld made of carbon steel. Digital radiography is done using blue imaging plate with x-ray source. PAUT is done using Olympus MX2 with 5 MHz probe consisting of 64 elements. The location, size and length of defect is compared. (author)

  4. Predicting muscle forces during the propulsion phase of single leg triple hop test.

    Science.gov (United States)

    Alvim, Felipe Costa; Lucareli, Paulo Roberto Garcia; Menegaldo, Luciano Luporini

    2018-01-01

    Functional biomechanical tests allow the assessment of musculoskeletal system impairments in a simple way. Muscle force synergies associated with movement can provide additional information for diagnosis. However, such forces cannot be directly measured noninvasively. This study aims to estimate muscle activations and forces exerted during the preparation phase of the single leg triple hop test. Two different approaches were tested: static optimization (SO) and computed muscle control (CMC). As an indirect validation, model-estimated muscle activations were compared with surface electromyography (EMG) of selected hip and thigh muscles. Ten physically healthy active women performed a series of jumps, and ground reaction forces, kinematics and EMG data were recorded. An existing OpenSim model with 92 musculotendon actuators was used to estimate muscle forces. Reflective markers data were processed using the OpenSim Inverse Kinematics tool. Residual Reduction Algorithm (RRA) was applied recursively before running the SO and CMC. For both, the same adjusted kinematics were used as inputs. Both approaches presented similar residuals amplitudes. SO showed a closer agreement between the estimated activations and the EMGs of some muscles. Due to inherent EMG methodological limitations, the superiority of SO in relation to CMC can be only hypothesized. It should be confirmed by conducting further studies comparing joint contact forces. The workflow presented in this study can be used to estimate muscle forces during the preparation phase of the single leg triple hop test and allows investigating muscle activation and coordination. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. Item response theory - A first approach

    Science.gov (United States)

    Nunes, Sandra; Oliveira, Teresa; Oliveira, Amílcar

    2017-07-01

    The Item Response Theory (IRT) has become one of the most popular scoring frameworks for measurement data, frequently used in computerized adaptive testing, cognitively diagnostic assessment and test equating. According to Andrade et al. (2000), IRT can be defined as a set of mathematical models (Item Response Models - IRM) constructed to represent the probability of an individual giving the right answer to an item of a particular test. The number of Item Responsible Models available to measurement analysis has increased considerably in the last fifteen years due to increasing computer power and due to a demand for accuracy and more meaningful inferences grounded in complex data. The developments in modeling with Item Response Theory were related with developments in estimation theory, most remarkably Bayesian estimation with Markov chain Monte Carlo algorithms (Patz & Junker, 1999). The popularity of Item Response Theory has also implied numerous overviews in books and journals, and many connections between IRT and other statistical estimation procedures, such as factor analysis and structural equation modeling, have been made repeatedly (Van der Lindem & Hambleton, 1997). As stated before the Item Response Theory covers a variety of measurement models, ranging from basic one-dimensional models for dichotomously and polytomously scored items and their multidimensional analogues to models that incorporate information about cognitive sub-processes which influence the overall item response process. The aim of this work is to introduce the main concepts associated with one-dimensional models of Item Response Theory, to specify the logistic models with one, two and three parameters, to discuss some properties of these models and to present the main estimation procedures.

  6. Evaluation of psychometric properties and differential item functioning of 8-item Child Perceptions Questionnaires using item response theory.

    Science.gov (United States)

    Yau, David T W; Wong, May C M; Lam, K F; McGrath, Colman

    2015-08-19

    Four-factor structure of the two 8-item short forms of Child Perceptions Questionnaire CPQ11-14 (RSF:8 and ISF:8) has been confirmed. However, the sum scores are typically reported in practice as a proxy of Oral health-related Quality of Life (OHRQoL), which implied a unidimensional structure. This study first assessed the unidimensionality of 8-item short forms of CPQ11-14. Item response theory (IRT) was employed to offer an alternative and complementary approach of validation and to overcome the limitations of classical test theory assumptions. A random sample of 649 12-year-old school children in Hong Kong was analyzed. Unidimensionality of the scale was tested by confirmatory factor analysis (CFA), principle component analysis (PCA) and local dependency (LD) statistic. Graded response model was fitted to the data. Contribution of each item to the scale was assessed by item information function (IIF). Reliability of the scale was assessed by test information function (TIF). Differential item functioning (DIF) across gender was identified by Wald test and expected score functions. Both CPQ11-14 RSF:8 and ISF:8 did not deviate much from the unidimensionality assumption. Results from CFA indicated acceptable fit of the one-factor model. PCA indicated that the first principle component explained >30 % of the total variation with high factor loadings for both RSF:8 and ISF:8. Almost all LD statistic items suggesting little contribution of information to the scale and item removal caused little practical impact. Comparing the TIFs, RSF:8 showed slightly better information than ISF:8. In addition to oral symptoms items, the item "Concerned with what other people think" demonstrated a uniform DIF (p Items related to oral symptoms were not informative to OHRQoL and deletion of these items is suggested. The impact of DIF across gender on the overall score was minimal. CPQ11-14 RSF:8 performed slightly better than ISF:8 in measurement precision. The 6-item short forms

  7. Combination of classical test theory (CTT) and item response theory (IRT) analysis to study the psychometric properties of the French version of the Quality of Life Enjoyment and Satisfaction Questionnaire-Short Form (Q-LES-Q-SF).

    Science.gov (United States)

    Bourion-Bédès, Stéphanie; Schwan, Raymund; Epstein, Jonathan; Laprevote, Vincent; Bédès, Alex; Bonnet, Jean-Louis; Baumann, Cédric

    2015-02-01

    The study aimed to examine the construct validity and reliability of the Quality of Life Enjoyment and Satisfaction Questionnaire-Short Form (Q-LES-Q-SF) according to both classical test and item response theories. The psychometric properties of the French version of this instrument were investigated in a cross-sectional, multicenter study. A total of 124 outpatients with a substance dependence diagnosis participated in the study. Psychometric evaluation included descriptive analysis, internal consistency, test-retest reliability, and validity. The dimensionality of the instrument was explored using a combination of the classical test, confirmatory factor analysis (CFA), and an item response theory analysis, the Person Separation Index (PSI), in a complementary manner. The results of the Q-LES-Q-SF revealed that the questionnaire was easy to administer and the acceptability was good. The internal consistency and the test-retest reliability were 0.9 and 0.88, respectively. All items were significantly correlated with the total score and the SF-12 used in the study. The CFA with one factor model was good, and for the unidimensional construct, the PSI was found to be 0.902. The French version of the Q-LES-Q-SF yielded valid and reliable clinical assessments of the quality of life for future research and clinical practice involving French substance abusers. In response to recent questioning regarding the unidimensionality or bidimensionality of the instrument and according to the underlying theoretical unidimensional construct used for its development, this study suggests the Q-LES-Q-SF as a one-dimension questionnaire in French QoL studies.

  8. Using latent class analysis to estimate the test characteristics of the γ-interferon test, the single intradermal comparative tuberculin test and a multiplex immunoassay under Irish conditions

    DEFF Research Database (Denmark)

    Clegg, Tracy A.; Duignan, Anthony; Whelan, Clare

    2011-01-01

    Considerable effort has been devoted to improving the existing diagnostic tests for bovine tuberculosis (single intradermal comparative tuberculin test [SICTT] and ¿-interferon assay [¿-IFN]) and to develop new tests. Previously, the diagnostic characteristics (sensitivity, specificity) have been...... estimated in populations with defined infection status. However, these approaches can be problematic as there may be few herds in Ireland where freedom from infection is guaranteed. We used latent class models to estimate the diagnostic characteristics of existing (SICTT and ¿-IFN) and new (multiplex...... immunoassay [Enferplex-TB]) diagnostic tests under Irish field conditions where true disease status was unknown. The study population consisted of herds recruited in areas with no known TB problems (2197 animals) and herds experiencing a confirmed TB breakdown (2740 animals). A Bayesian model was developed...

  9. An equivalent ground thermal test method for single-phase fluid loop space radiator

    Directory of Open Access Journals (Sweden)

    Xianwen Ning

    2015-02-01

    Full Text Available Thermal vacuum test is widely used for the ground validation of spacecraft thermal control system. However, the conduction and convection can be simulated in normal ground pressure environment completely. By the employment of pumped fluid loops’ thermal control technology on spacecraft, conduction and convection become the main heat transfer behavior between radiator and inside cabin. As long as the heat transfer behavior between radiator and outer space can be equivalently simulated in normal pressure, the thermal vacuum test can be substituted by the normal ground pressure thermal test. In this paper, an equivalent normal pressure thermal test method for the spacecraft single-phase fluid loop radiator is proposed. The heat radiation between radiator and outer space has been equivalently simulated by combination of a group of refrigerators and thermal electrical cooler (TEC array. By adjusting the heat rejection of each device, the relationship between heat flux and surface temperature of the radiator can be maintained. To verify this method, a validating system has been built up and the experiments have been carried out. The results indicate that the proposed equivalent ground thermal test method can simulate the heat rejection performance of radiator correctly and the temperature error between in-orbit theory value and experiment result of the radiator is less than 0.5 °C, except for the equipment startup period. This provides a potential method for the thermal test of space systems especially for extra-large spacecraft which employs single-phase fluid loop radiator as thermal control approach.

  10. Resonance fluorescence and quantum jumps in single atoms: Testing the randomness of quantum mechanics

    International Nuclear Information System (INIS)

    Erber, T.; Hammerling, P.; Hockney, G.; Porrati, M.; Putterman, S.; La Jolla Institute, La Jolla, California 92037; Department of Physics, University of California, Los Angeles, California 90024)

    1989-01-01

    When a single trapped 198 Hg + ion is illuminated by two lasers, each tuned to an approximate transition, the resulting fluorescence switches on and off in a series of pulses resembling a bistable telegraph. This intermittent fluorescence can also be obtained by optical pumping with a single laser. Quantum jumps between successive atomic levels may be traced directly with multiple-resonance fluorescence. Atomic transition rates and photon antibunching distributions can be inferred from the pulse statistics and compared with quantum theory. Stochastic tests also indicate that the quantum telegraphs are good random number generators. During periods when the fluorescence is switched off, the radiationless atomic currents that generate the telegraph signals can be adjusted by varying the laser illumination: if this coherent evolution of the wave functions is sustained over sufficiently long time intervals, novel interactive precision measurements, near the limits of the time-energy uncertainty relations, are possible. Copyright 1989 Academic Press, Inc

  11. Robust Scale Transformation Methods in IRT True Score Equating under Common-Item Nonequivalent Groups Design

    Science.gov (United States)

    He, Yong

    2013-01-01

    Common test items play an important role in equating multiple test forms under the common-item nonequivalent groups design. Inconsistent item parameter estimates among common items can lead to large bias in equated scores for IRT true score equating. Current methods extensively focus on detection and elimination of outlying common items, which…

  12. Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating

    Science.gov (United States)

    He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei

    2013-01-01

    Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…

  13. Fractal and Morphological Characteristics of Single Marble Particle Crushing in Uniaxial Compression Tests

    Directory of Open Access Journals (Sweden)

    Yidong Wang

    2015-01-01

    Full Text Available Crushing of rock particles is a phenomenon commonly encountered in geotechnical engineering practice. It is however difficult to study the crushing of rock particles using classical theory because the physical structure of the particles is complex and irregular. This paper aims at evaluating fractal and morphological characteristics of single rock particle. A large number of particle crushing tests are conducted on single rock particle. The force-displacement curves and the particle size distributions (PSD of crushed particles are analysed based on particle crushing tests. Particle shape plays an important role in both the micro- and macroscale responses of a granular assembly. The PSD of an assortment of rocks are analysed by fractal methods, and the fractal dimension is obtained. A theoretical formula for particle crushing strength is derived, utilising the fractal model, and a simple method is proposed for predicting the probability of particle survival based on the Weibull statistics. Based on a few physical assumptions, simple equations are derived for determining particle crushing energy. The results of applying these equations are tested against the actual experimental data and prove to be very consistent. Fractal theory is therefore applicable for analysis of particle crushing.

  14. SHIPPING OF RADIOACTIVE ITEMS

    CERN Multimedia

    TIS/RP Group

    2001-01-01

    The TIS-RP group informs users that shipping of small radioactive items is normally guaranteed within 24 hours from the time the material is handed in at the TIS-RP service. This time is imposed by the necessary procedures (identification of the radionuclides, determination of dose rate and massive objects require a longer procedure and will therefore take longer.

  15. Selecting Lower Priced Items.

    Science.gov (United States)

    Kleinert, Harold L.; And Others

    1988-01-01

    A program used to teach moderately to severely mentally handicapped students to select the lower priced items in actual shopping activities is described. Through a five-phase process, students are taught to compare prices themselves as well as take into consideration variations in the sizes of containers and varying product weights. (VW)

  16. Item information and discrimination functions for trinary PCM items

    NARCIS (Netherlands)

    Akkermans, Wies; Muraki, Eiji

    1997-01-01

    For trinary partial credit items the shape of the item information and the item discrimination function is examined in relation to the item parameters. In particular, it is shown that these functions are unimodal if δ2 – δ1 < 4 ln 2 and bimodal otherwise. The locations and values of the maxima are

  17. A multimedia situational judgment test with a constructed-response item format: Its relationship with personality, cognitive ability, job experience, and academic performance

    NARCIS (Netherlands)

    Oostrom, J.K.; Born, M.Ph.; Serlie, A.W.; Van der Molen, H.T.

    2011-01-01

    Advances in computer technology have created opportunities for the development of a multimedia situational test in which responses are filmed with a webcam. This paper examined the relationship of a so-called webcam test with personality, cognitive ability, job experience, and academic performance.

  18. A Linguistic Analysis of the Sample Numeracy Skills Test Items for Pre-Service Teachers Issued by the Australian Council for Educational Research (ACER)

    Science.gov (United States)

    O'Keeffe, Lisa; O'Halloran, Kay L.; Wignell, Peter; Tan, Sabine

    2017-01-01

    In 2015, the Australian Council for Educational Research (ACER) was tasked with developing literacy and numeracy skills testing for pre-service teachers. All undergraduate and postgraduate trainee teachers are now required to pass these literacy and numeracy tests at some stage on their journey to becoming a teacher; for commencing students from…

  19. Tidal Volume Single Breath Washout of Two Tracer Gases - A Practical and Promising Lung Function Test

    Science.gov (United States)

    Singer, Florian; Stern, Georgette; Thamrin, Cindy; Fuchs, Oliver; Riedel, Thomas; Gustafsson, Per; Frey, Urs; Latzin, Philipp

    2011-01-01

    Background Small airway disease frequently occurs in chronic lung diseases and may cause ventilation inhomogeneity (VI), which can be assessed by washout tests of inert tracer gas. Using two tracer gases with unequal molar mass (MM) and diffusivity increases specificity for VI in different lung zones. Currently washout tests are underutilised due to the time and effort required for measurements. The aim of this study was to develop and validate a simple technique for a new tidal single breath washout test (SBW) of sulfur hexafluoride (SF6) and helium (He) using an ultrasonic flowmeter (USFM). Methods The tracer gas mixture contained 5% SF6 and 26.3% He, had similar total MM as air, and was applied for a single tidal breath in 13 healthy adults. The USFM measured MM, which was then plotted against expired volume. USFM and mass spectrometer signals were compared in six subjects performing three SBW. Repeatability and reproducibility of SBW, i.e., area under the MM curve (AUC), were determined in seven subjects performing three SBW 24 hours apart. Results USFM reliably measured MM during all SBW tests (n = 60). MM from USFM reflected SF6 and He washout patterns measured by mass spectrometer. USFM signals were highly associated with mass spectrometer signals, e.g., for MM, linear regression r-squared was 0.98. Intra-subject coefficient of variation of AUC was 6.8%, and coefficient of repeatability was 11.8%. Conclusion The USFM accurately measured relative changes in SF6 and He washout. SBW tests were repeatable and reproducible in healthy adults. We have developed a fast, reliable, and straightforward USFM based SBW method, which provides valid information on SF6 and He washout patterns during tidal breathing. PMID:21423739

  20. Design and cold-air test of single-stage uncooled turbine with high work output

    Science.gov (United States)

    Moffitt, T. P.; Szanca, E. M.; Whitney, W. J.; Behning, F. P.

    1980-01-01

    A solid version of a 50.8 cm single stage core turbine designed for high temperature was tested in cold air over a range of speed and pressure ratio. Design equivalent specific work was 76.84 J/g at an engine turbine tip speed of 579.1 m/sec. At design speed and pressure ratio, the total efficiency of the turbine was 88.6 percent, which is 0.6 point lower than the design value of 89.2 percent. The corresponding mass flow was 4.0 percent greater than design.