test items cml: Topics by WorldWideScience.org

Sample records for test items cml

Item calibration in incomplete testing designs

Directory of Open Access Journals (Sweden)

Norman D. Verhelst

2011-01-01

Full Text Available This study discusses the justifiability of item parameter estimation in incomplete testing designs in item response theory. Marginal maximum likelihood (MML as well as conditional maximum likelihood (CML procedures are considered in three commonly used incomplete designs: random incomplete, multistage testing and targeted testing designs. Mislevy and Sheenan (1989 have shown that in incomplete designs the justifiability of MML can be deduced from Rubin's (1976 general theory on inference in the presence of missing data. Their results are recapitulated and extended for more situations. In this study it is shown that for CML estimation the justification must be established in an alternative way, by considering the neglected part of the complete likelihood. The problems with incomplete designs are not generally recognized in practical situations. This is due to the stochastic nature of the incomplete designs which is not taken into account in standard computer algorithms. For that reason, incorrect uses of standard MML- and CML-algorithms are discussed.
The Culture Repopulation Ability (CRA) Assay and Incubation in Low Oxygen to Test Antileukemic Drugs on Imatinib-Resistant CML Stem-Like Cells.

Science.gov (United States)

Cheloni, Giulia; Tanturli, Michele

2016-01-01

Chronic myeloid leukemia (CML) is a stem cell-driven disorder caused by the BCR/Abl oncoprotein, a constitutively active tyrosine kinase (TK). Chronic-phase CML patients are treated with impressive efficacy with TK inhibitors (TKi) such as imatinib mesylate (IM). However, rather than definitively curing CML, TKi induces a state of minimal residual disease, due to the persistence of leukemia stem cells (LSC) which are insensitive to this class of drugs. LSC persistence may be due to different reasons, including the suppression of BCR/Abl oncoprotein. It has been shown that this suppression follows incubation in low oxygen under appropriate culture conditions and incubation times.Here we describe the culture repopulation ability (CRA) assay, a non-clonogenic assay capable - together with incubation in low oxygen - to reveal in vitro stem cells endowed with marrow repopulation ability (MRA) in vivo. The CRA assay can be used, before moving to animal tests, as a simple and reliable method for the prescreening of drugs potentially active on CML and other leukemias with respect to their activity on the more immature leukemia cell subsets.
Using automatic item generation to create multiple-choice test items.

Science.gov (United States)

Gierl, Mark J; Lai, Hollis; Turner, Simon R

2012-08-01

Many tests of medical knowledge, from the undergraduate level to the level of certification and licensure, contain multiple-choice items. Although these are efficient in measuring examinees' knowledge and skills across diverse content areas, multiple-choice items are time-consuming and expensive to create. Changes in student assessment brought about by new forms of computer-based testing have created the demand for large numbers of multiple-choice items. Our current approaches to item development cannot meet this demand. We present a methodology for developing multiple-choice items based on automatic item generation (AIG) concepts and procedures. We describe a three-stage approach to AIG and we illustrate this approach by generating multiple-choice items for a medical licensure test in the content area of surgery. To generate multiple-choice items, our method requires a three-stage process. Firstly, a cognitive model is created by content specialists. Secondly, item models are developed using the content from the cognitive model. Thirdly, items are generated from the item models using computer software. Using this methodology, we generated 1248 multiple-choice items from one item model. Automatic item generation is a process that involves using models to generate items using computer technology. With our method, content specialists identify and structure the content for the test items, and computer technology systematically combines the content to generate new test items. By combining these outcomes, items can be generated automatically. © Blackwell Publishing Ltd 2012.
Evolution of a Test Item

Science.gov (United States)

Spaan, Mary

2007-01-01

This article follows the development of test items (see "Language Assessment Quarterly", Volume 3 Issue 1, pp. 71-79 for the article "Test and Item Specifications Development"), beginning with a review of test and item specifications, then proceeding to writing and editing of items, pretesting and analysis, and finally selection of an item for a…
Respective contribution of CML8 and CML9, two arabidopsis calmodulin-like proteins, to plant stress responses.

Science.gov (United States)

Zhu, Xiaoyang; Perez, Manon; Aldon, Didier; Galaud, Jean-Philippe

2017-05-04

In their natural environment, plants have to continuously face constraints such as biotic and abiotic stresses. To achieve their life cycle, plants have to perceive and interpret the nature, but also the strength of environmental stimuli to activate appropriate physiological responses. Nowadays, it is well established that signaling pathways are crucial steps in the implementation of rapid and efficient plant responses such as genetic reprogramming. It is also reported that rapid raises in calcium (Ca 2+ ) levels within plant cells participate in these early signaling steps and are essential to coordinate adaptive responses. However, to be informative, calcium increases need to be decoded and relayed by calcium-binding proteins also referred as calcium sensors to carry-out the appropriate responses. In a recent study, we showed that CML8, an Arabidopsis calcium sensor belonging to the calmodulin-like (CML) protein family, promotes plant immunity against the phytopathogenic bacteria Pseudomonas syringae pv tomato (strain DC3000). Interestingly, other CML proteins such as CML9 were also reported to contribute to plant immunity using the same pathosystem. In this addendum, we propose to discuss about the specific contribution of these 2 CMLs in stress responses.
Chronic myelogenous leukemia (CML)

Science.gov (United States)

CML; Chronic myeloid leukemia; Chronic granulocytic leukemia; Leukemia - chronic granulocytic ... nuclear disaster. It takes many years to develop leukemia from radiation exposure. Most people treated for cancer ...
Variable behavior of iPSCs derived from CML patients for response to TKI and hematopoietic differentiation.

Directory of Open Access Journals (Sweden)

Aurélie Bedel

Full Text Available Chronic myeloid leukemia disease (CML found effective therapy by treating patients with tyrosine kinase inhibitors (TKI, which suppress the BCR-ABL1 oncogene activity. However, the majority of patients achieving remission with TKI still have molecular evidences of disease persistence. Various mechanisms have been proposed to explain the disease persistence and recurrence. One of the hypotheses is that the primitive leukemic stem cells (LSCs can survive in the presence of TKI. Understanding the mechanisms leading to TKI resistance of the LSCs in CML is a critical issue but is limited by availability of cells from patients. We generated induced pluripotent stem cells (iPSCs derived from CD34⁺ blood cells isolated from CML patients (CML-iPSCs as a model for studying LSCs survival in the presence of TKI and the mechanisms supporting TKI resistance. Interestingly, CML-iPSCs resisted to TKI treatment and their survival did not depend on BCR-ABL1, as for primitive LSCs. Induction of hematopoietic differentiation of CML-iPSC clones was reduced compared to normal clones. Hematopoietic progenitors obtained from iPSCs partially recovered TKI sensitivity. Notably, different CML-iPSCs obtained from the same CML patients were heterogeneous, in terms of BCR-ABL1 level and proliferation. Thus, several clones of CML-iPSCs are a powerful model to decipher all the mechanisms leading to LSC survival following TKI therapy and are a promising tool for testing new therapeutic agents.
Selecting Items for Criterion-Referenced Tests.

Science.gov (United States)

Mellenbergh, Gideon J.; van der Linden, Wim J.

1982-01-01

Three item selection methods for criterion-referenced tests are examined: the classical theory of item difficulty and item-test correlation; the latent trait theory of item characteristic curves; and a decision-theoretic approach for optimal item selection. Item contribution to the standardized expected utility of mastery testing is discussed. (CM)
A power-efficient switchable CML driver at 10 Gbps

Science.gov (United States)

Peipei, Chen; Lei, Li; Huihua, Liu

2016-02-01

High static power limits the application of conventional current-mode logic(CML). This paper presents a power-efficient switchable CML driver, which achieves a significant current saving by 75% compared with conventional ones. Implemented in the 130 nm CMOS technology process, the proposed CML driver just occupies an area about 0.003 mm2 and provides a robust differential signal of 1600 mV for 10 Gbps optical line terminal (OLT) with a total current of 10 mA. The peak-to-peak jitter is about 4 ps (0.04TUI) and the offset voltage is 347.2 mV @ 1600 mVPP.
Expression, purification and preliminary diffraction studies of CmlS

International Nuclear Information System (INIS)

Latimer, Ryan; Podzelinska, Kateryna; Soares, Alexei; Bhattacharya, Anupam; Vining, Leo C.; Jia, Zongchao; Zechel, David L.

2009-01-01

CmlS from S. venezuelae is a flavin-dependent halogenase that is involved in the biosynthesis of the widely used antibiotic chloramphenicol. Here, the crystallization of CmlS and analysis of the initial diffraction data are reported. CmlS, a flavin-dependent halogenase (FDH) present in the chloramphenicol-biosynthetic pathway in Streptomyces venezuelae, directs the dichlorination of an acetyl group. The reaction mechanism of CmlS is of considerable interest as it will help to explain how the FDH family can halogenate a wide range of substrates through a common mechanism. The protein has been recombinantly expressed in Escherichia coli and purified to homogeneity. The hanging-drop vapour-diffusion method was used to produce crystals that were suitable for X-ray diffraction. Data were collected to 2.0 Å resolution. The crystal belonged to space group C2, with unit-cell parameters a = 208.1, b = 57.7, c = 59.9 Å, β = 97.5°
A power-efficient switchable CML driver at 10 Gbps

International Nuclear Information System (INIS)

Chen Peipei; Li Lei; Liu Huihua

2016-01-01

High static power limits the application of conventional current-mode logic(CML). This paper presents a power-efficient switchable CML driver, which achieves a significant current saving by 75% compared with conventional ones. Implemented in the 130 nm CMOS technology process, the proposed CML driver just occupies an area about 0.003 mm 2 and provides a robust differential signal of 1600 mV for 10 Gbps optical line terminal (OLT) with a total current of 10 mA. The peak-to-peak jitter is about 4 ps (0.04T UI ) and the offset voltage is 347.2 mV @ 1600 mV PP . (paper)
Induction of CML28-specific cytotoxic T cell responses using co-transfected dendritic cells with CML28 DNA vaccine and SOCS1 small interfering RNA expression vector

International Nuclear Information System (INIS)

Zhou Hongsheng; Zhang Donghua; Wang Yaya; Dai Ming; Zhang Lu; Liu Wenli; Liu Dan; Tan Huo; Huang Zhenqian

2006-01-01

CML28 is an attractive target for antigen-specific immunotherapy. SOCS1 represents an inhibitory control mechanism for DC antigen presentation and the magnitude of adaptive immunity. In this study, we evaluated the potential for inducing CML28-specific cytotoxic T lymphocytes (CTL) responses by dendritic cells (DCs)-based vaccination. We constructed a CML28 DNA vaccine and a SOCS1 siRNA vector and then cotransfect monocyte-derived DCs. Flow cytometry analysis showed gene silencing of SOCS1 resulted in higher expressions of costimulative moleculars in DCs. Mixed lymphocyte reaction (MLR) indicated downregulation of SOCS1 stronger capability to stimulate proliferation of responder cell in DCs. The CTL assay revealed transfected DCs effectively induced autologous CML28-specific CTL responses and the lytic activities induced by SOCS1-silenced DCs were significantly higher compared with those induced by SOCS1-expressing DCs. These results in our study indicates gene silencing of SOCS1 remarkably enhanced the cytotoxicity efficiency of CML28 DNA vaccine in DCs
Item Analysis in Introductory Economics Testing.

Science.gov (United States)

Tinari, Frank D.

1979-01-01

Computerized analysis of multiple choice test items is explained. Examples of item analysis applications in the introductory economics course are discussed with respect to three objectives: to evaluate learning; to improve test items; and to help improve classroom instruction. Problems, costs and benefits of the procedures are identified. (JMD)
Human AQP5 plays a role in the progression of chronic myelogenous leukemia (CML.

Directory of Open Access Journals (Sweden)

Young Kwang Chae

2008-07-01

Full Text Available Aquaporins (AQPs have previously been associated with increased expression in solid tumors. However, its expression in hematologic malignancies including CML has not been described yet. Here, we report the expression of AQP5 in CML cells by RT-PCR and immunohistochemistry. While normal bone marrow biopsy samples (n = 5 showed no expression of AQP5, 32% of CML patient samples (n = 41 demonstrated AQP5 expression. In addition, AQP5 expression level increased with the emergence of imatinib mesylate resistance in paired samples (p = 0.047. We have found that the overexpression of AQP5 in K562 cells resulted in increased cell proliferation. In addition, small interfering RNA (siRNA targeting AQP5 reduced the cell proliferation rate in both K562 and LAMA84 CML cells. Moreover, by immunoblotting and flow cytometry, we show that phosphorylation of BCR-ABL1 is increased in AQP5-overexpressing CML cells and decreased in AQP5 siRNA-treated CML cells. Interestingly, caspase9 activity increased in AQP5 siRNA-treated cells. Finally, FISH showed no evidence of AQP5 gene amplification in CML from bone marrow. In summary, we report for the first time that AQP5 is overexpressed in CML cells and plays a role in promoting cell proliferation and inhibiting apoptosis. Furthermore, our findings may provide the basis for a novel CML therapy targeting AQP5.
Science Library of Test Items. Volume Eighteen. A Collection of Multiple Choice Test Items Relating Mainly to Chemistry.

Science.gov (United States)

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Seventeen. A Collection of Multiple Choice Test Items Relating Mainly to Biology.

Science.gov (United States)

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Nineteen. A Collection of Multiple Choice Test Items Relating Mainly to Geology.

Science.gov (United States)

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Evaluation of Northwest University, Kano Post-UTME Test Items Using Item Response Theory

Science.gov (United States)

Bichi, Ado Abdu; Hafiz, Hadiza; Bello, Samira Abdullahi

2016-01-01

High-stakes testing is used for the purposes of providing results that have important consequences. Validity is the cornerstone upon which all measurement systems are built. This study applied the Item Response Theory principles to analyse Northwest University Kano Post-UTME Economics test items. The developed fifty (50) economics test items was…
Turkish Chronic Myeloid Leukemia Study: Retrospective Sectional Analysis of CML Patients

Directory of Open Access Journals (Sweden)

Fahri Şahin

2013-12-01

Full Text Available OBJECTIVE: here have been tremendous changes in treatment and follow-up of patients with chronic myeloid leukemia (CML in the last decade. Especially, regular publication and updating of NCCN and ELN guidelines have provided enermous rationale and base for close monitorization of patients with CML. But, it is stil needed to have registry results retrospectively to evaluate daily CML practices. METHODS: In this article, we have evaluated 1133 patients’ results with CML in terms of demographical features, disease status, response, resistance and use of second-generation TKIs. RESULTS: The response rate has been found relatively high in comparison with previously published articles, and we detected that there was a lack of appropriate and adequate molecular response assessment. CONCLUSION: We concluded that we need to improve registry systems and increase the availability of molecular response assessment to provide high-quality patient care.
CML/CD36 accelerates atherosclerotic progression via inhibiting foam cell migration.

Science.gov (United States)

Xu, Suining; Li, Lihua; Yan, Jinchuan; Ye, Fei; Shao, Chen; Sun, Zhen; Bao, Zhengyang; Dai, Zhiyin; Zhu, Jie; Jing, Lele; Wang, Zhongqun

2018-01-01

Among the various complications of type 2 diabetes mellitus, atherosclerosis causes the highest disability and morbidity. A multitude of macrophage-derived foam cells are retained in atherosclerotic plaques resulting not only from recruitment of monocytes into lesions but also from a reduced rate of macrophage migration from lesions. Nε-carboxymethyl-Lysine (CML), an advanced glycation end product, is responsible for most complications of diabetes. This study was designed to investigate the mechanism of CML/CD36 accelerating atherosclerotic progression via inhibiting foam cell migration. In vivo study and in vitro study were performed. For the in vivo investigation, CML/CD36 accelerated atherosclerotic progression via promoting the accumulation of macrophage-derived foam cells in aorta and inhibited macrophage-derived foam cells in aorta migrating to the para-aorta lymph node of diabetic apoE -/- mice. For the in vitro investigation, CML/CD36 inhibited RAW264.7-derived foam cell migration through NOX-derived ROS, FAK phosphorylation, Arp2/3 complex activation and F-actin polymerization. Thus, we concluded that CML/CD36 inhibited foam cells of plaque migrating to para-aorta lymph nodes, accelerating atherosclerotic progression. The corresponding mechanism may be via free cholesterol, ROS generation, p-FAK, Arp2/3, F-actin polymerization. Copyright © 2017 Elsevier Masson SAS. All rights reserved.

Chromosome abnormalities in the acute phase of CML

Energy Technology Data Exchange (ETDEWEB)

Rowley, J D

1978-01-01

Additional chromosome changes are superimposed on the Ph/sup 1/ positive cell line in approximately 80% of patients in the acute phase of chronic myelogenous leukemia (CML). These changes may precede the onset of blast crisis by several months. They are nonrandom and frequently involve an extra No. 8, an isochromosome for the long arm of No. 17, an extra No. 19, and a second Ph/sup 1/ chromosome. Since such changes may occur in combination, modal numbers frequently range between 47 and 57 chromosomes. Although present evidence suggests that abnormal clones originate, or at least proliferate, in the spleen, similar changes have been observed in patients who underwent splenectomy during the chronic phase of their disease. The question of particular clinical-chromosomal correlations has been discussed in only one study. It appeared that patients whose karyotype did not change might have a longer median survival than those whose karyotype showed additional abnormalities. Tests for levels of terminal deoxynucleotidyl transferase (TDT) and response to anti-acute lymphoblastic leukemia (ALL) serum suggest that some, but not all patients react as do patients with ALL. Those who are similar to ALL have high levels of TDT and are anti-ALL serum-positive; the others have low levels of TDT and are anti-ALL serum-negative. In the future, correlations of these more sophisticated tests with the blast morphology, clinical course, and karyotype pattern should provide significant new insights into the acute phase of CML.
Computerized Adaptive Test (CAT) Applications and Item Response Theory Models for Polytomous Items

Science.gov (United States)

Aybek, Eren Can; Demirtasli, R. Nukhet

2017-01-01

This article aims to provide a theoretical framework for computerized adaptive tests (CAT) and item response theory models for polytomous items. Besides that, it aims to introduce the simulation and live CAT software to the related researchers. Computerized adaptive test algorithm, assumptions of item response theory models, nominal response…
Science Library of Test Items. Volume Twenty-Two. A Collection of Multiple Choice Test Items Relating Mainly to Skills.

Science.gov (United States)

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Twenty. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 1.

Science.gov (United States)

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
DRUG THERAPY IN THE PROGRESSED CML PATIENT WITH MULTI-TKI FAILURE

Directory of Open Access Journals (Sweden)

Ibrahim C. Haznedaroglu

2015-02-01

Full Text Available The aim of this paper is to outline pharmacotherapy of the ‘third-line management of CML’ (progressive disease course after sequential TKI drugs. Current management of CML with multi-TKI failure is reviewed. TKI (bosutinib, ponatinib, dasatinib, nilotinib and non-TKI (omacetaxine mepussecinate, IFN or PEG-IFN drugs are available. The literature search was made in PubMed with particular focus on the clinical trials, recommendations, guidelines and expert opinions, as well as international recommendations. Progressing CML disease with multi-TKI failure should be treated with alloSCT based on the availability of the donor and EBMT transplant risk scores. The TKI and non-TKI drugs shall be used to get best promising (hematological, cytogenetic, molecular response. During the CP-CML phase of multi-TKI failure, 2nd generation TKIs (nilotinib or dasatinib are used if they remained. Bosutinib and ponatinib (3rd generation TKIs can be administered in triple-TKI failed (imatinib and nilotinib and dasatinib patients. The presence of T315I mutation at any phase requires ponatinib or omacetaxine mepussecinate therapy before allografting. During the AP/BC-CML phase of multi-TKI failure, the most powerful TKI available (ponatinib or dasatinib if remained together with chemotherapy should be given before alloSCT. Monitoring of CML disease and drug off-target risks (particularly vascular thrombotic events are vital.
Instructional Topics in Educational Measurement (ITEMS) Module: Using Automated Processes to Generate Test Items

Science.gov (United States)

Gierl, Mark J.; Lai, Hollis

2013-01-01

Changes to the design and development of our educational assessments are resulting in the unprecedented demand for a large and continuous supply of content-specific test items. One way to address this growing demand is with automatic item generation (AIG). AIG is the process of using item models to generate test items with the aid of computer…
Electronics. Criterion-Referenced Test (CRT) Item Bank.

Science.gov (United States)

Davis, Diane, Ed.

This document contains 519 criterion-referenced multiple choice and true or false test items for a course in electronics. The test item bank is designed to work with both the Vocational Instructional Management System (VIMS) and the Vocational Administrative Management System (VAMS) in Missouri. The items are grouped into 15 units covering the…
Immunohistochemical study of N-epsilon-carboxymethyl lysine (CML in human brain: relation to vascular dementia

Directory of Open Access Journals (Sweden)

Williams Jonathan

2007-10-01

Full Text Available Abstract Background Advanced glycation end-products (AGEs and their receptor (RAGE occur in dementia of the Alzheimer's type and diabetic microvascular disease. Accumulation of AGEs relates to risk factors for vascular dementia with ageing, including hypertension and diabetes. Cognitive dysfunction in vascular dementia may relate to microvascular disease resembling that in diabetes. We tested if, among people with cerebrovascular disease, (1 those with dementia have higher levels of neuronal and vascular AGEs and (2 if cognitive dysfunction depends on neuronal and/or vascular AGE levels. Methods Brain Sections from 25 cases of the OPTIMA (Oxford Project to Investigate Memory and Ageing cohort, with varying degrees of cerebrovascular pathology and cognitive dysfunction (but only minimal Alzheimer type pathology were immunostained for Nε-(carboxymethyl-lysine (CML, the most abundant AGE. The level of staining in vessels and neurons in the cortex, white matter and basal ganglia was compared to neuropsychological and other clinical measures. Results The probability of cortical neurons staining positive for CML was higher in cases with worse cognition (p = 0.01 or a history of hypertension (p = 0.028. Additionally, vascular CML staining related to cognitive impairment (p = 0.02 and a history of diabetes (p = 0.007. Neuronal CML staining in the basal ganglia related to a history of hypertension (p = 0.002. Conclusion CML staining in cortical neurons and cerebral vessels is related to the severity of cognitive impairment in people with cerebrovascular disease and only minimal Alzheimer pathology. These findings support the possibility that cerebral accumulation of AGEs may contribute to dementia in people with cerebrovascular disease.
Tyrosine kinase inhibitors therapy related neutropenia and thrombocythopenia correction in CML patients

Directory of Open Access Journals (Sweden)

V. A. Shuvaev

2014-07-01

Full Text Available At present, introduction of target therapy to chronic myelogenous leukemia (CML treatment made CML not life-limiting disorder. The main condition of treatment efficacy is its continuity. The most common causes of dose reduction and CML therapy interruption is hematologic toxicities such as neutropenia and thrombocytopenia. The adverse events correction in these circumstances is vital. Recommendations for neutropenia and thrombocytopenia correction are proposed in this article. The basement and results of the use of granulocyte colony stimulating factor (G-CSF and thrombopoietine receptor agonist for hematologic toxicities correction with clinical case are presented.
Science Library of Test Items. Volume Twenty-One. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 2.

Science.gov (United States)

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
The calmodulin-like protein, CML39, is involved in regulating seed development, germination, and fruit development in Arabidopsis.

Science.gov (United States)

Midhat, Ubaid; Ting, Michael K Y; Teresinski, Howard J; Snedden, Wayne A

2018-03-01

We show that the calcium sensor, CML39, is important in various developmental processes from seeds to mature plants. This study bridges previous work on CML39 as a stress-induced gene and highlights the importance of calcium signalling in plant development. In addition to the evolutionarily-conserved Ca 2+ sensor, calmodulin (CaM), plants possess a large family of CaM-related proteins (CMLs). Using a cml39 loss-of-function mutant, we investigated the roles of CML39 in Arabidopsis and discovered a range of phenotypes across developmental stages and in different tissues. In mature plants, loss of CML39 results in shorter siliques, reduced seed number per silique, and reduced number of ovules per pistil. We also observed changes in seed development, germination, and seed coat properties in cml39 mutants in comparison to wild-type plants. Using radicle emergence as a measure of germination, cml39 mutants showed more rapid germination than wild-type plants. In marked contrast to wild-type seeds, the germination of developing, immature cml39 seeds was not sensitive to cold-stratification. In addition, germination of cml39 seeds was less sensitive than wild-type to inhibition by ABA or by treatments that impaired gibberellic acid biosynthesis. Tetrazolium red staining indicated that the seed-coat permeability of cml39 seeds is greater than that of wild-type seeds. RNA sequencing analysis of cml39 seedlings suggests that changes in chromatin modification may underlie some of the phenotypes associated with cml39 mutants, consistent with previous reports that orthologs of CML39 participate in gene silencing. Aberrant ectopic expression of transcripts for seed storage proteins in 7-day old cml39 seedlings was observed, suggesting mis-regulation of early developmental programs. Collectively, our data support a model where CML39 serves as an important Ca 2+ sensor during ovule and seed development, as well as during germination and seedling establishment.
Guide to good practices for the development of test items

Energy Technology Data Exchange (ETDEWEB)

NONE

1997-01-01

While the methodology used in developing test items can vary significantly, to ensure quality examinations, test items should be developed systematically. Test design and development is discussed in the DOE Guide to Good Practices for Design, Development, and Implementation of Examinations. This guide is intended to be a supplement by providing more detailed guidance on the development of specific test items. This guide addresses the development of written examination test items primarily. However, many of the concepts also apply to oral examinations, both in the classroom and on the job. This guide is intended to be used as guidance for the classroom and laboratory instructor or curriculum developer responsible for the construction of individual test items. This document focuses on written test items, but includes information relative to open-reference (open book) examination test items, as well. These test items have been categorized as short-answer, multiple-choice, or essay. Each test item format is described, examples are provided, and a procedure for development is included. The appendices provide examples for writing test items, a test item development form, and examples of various test item formats.
The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory

Science.gov (United States)

Sahin, Alper; Anil, Duygu

2017-01-01

This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…
Appearance and Disappearance of Chronic Myeloid Leukemia (CML) in Patient with Chronic Lymphocytic Leukemia (CLL).

Science.gov (United States)

Payandeh, Mehrdad; Sadeghi, Edris; Khodarahmi, Reza; Sadeghi, Masoud

2014-10-01

Chronic lymphocytic leukemia (CLL) and chronic myeloid leukemia (CML) are the most common leukemias of the elderly (>43 year). However, the sequential occurrence of CML followed by CLL in the same patient is extremely rare. In our report, a 52-year-old female was diagnosed with CLL (type of bone marrow (BM) infiltration was nodular and interstitial) and was treated with chlorambucil. 64 months after the diagnosis of CLL, she developed CML. She was treated with imatinib (400mg/day). After a few months, signs of CML were disappeared and CLL became dominant. This is first reported case.
ACER Chemistry Test Item Collection. ACER Chemtic Year 12.

Science.gov (United States)

Australian Council for Educational Research, Hawthorn.

The chemistry test item banks contains 225 multiple-choice questions suitable for diagnostic and achievement testing; a three-page teacher's guide; answer key with item facilities; an answer sheet; and a 45-item sample achievement test. Although written for the new grade 12 chemistry course in Victoria, Australia, the items are widely applicable.…
Assessing difference between classical test theory and item ...

African Journals Online (AJOL)

Assessing difference between classical test theory and item response theory methods in scoring primary four multiple choice objective test items. ... All research participants were ranked on the CTT number correct scores and the corresponding IRT item pattern scores from their performance on the PRISMADAT. Wilcoxon ...
Applying modern psychometric techniques to melodic discrimination testing: Item response theory, computerised adaptive testing, and automatic item generation.

Science.gov (United States)

Harrison, Peter M C; Collins, Tom; Müllensiefen, Daniel

2017-06-15

Modern psychometric theory provides many useful tools for ability testing, such as item response theory, computerised adaptive testing, and automatic item generation. However, these techniques have yet to be integrated into mainstream psychological practice. This is unfortunate, because modern psychometric techniques can bring many benefits, including sophisticated reliability measures, improved construct validity, avoidance of exposure effects, and improved efficiency. In the present research we therefore use these techniques to develop a new test of a well-studied psychological capacity: melodic discrimination, the ability to detect differences between melodies. We calibrate and validate this test in a series of studies. Studies 1 and 2 respectively calibrate and validate an initial test version, while Studies 3 and 4 calibrate and validate an updated test version incorporating additional easy items. The results support the new test's viability, with evidence for strong reliability and construct validity. We discuss how these modern psychometric techniques may also be profitably applied to other areas of music psychology and psychological science in general.
Frequency of BCR-ABL Transcript Types in Syrian CML Patients

Directory of Open Access Journals (Sweden)

Sulaf Farhat-Maghribi

2016-01-01

Full Text Available Background. In Syria, CML patients are started on tyrosine kinase inhibitors (TKIs and monitored until complete molecular response is achieved. BCR-ABL mRNA transcript type is not routinely identified, contrary to the recommendations. In this study we aimed to identify the frequency of different BCR-ABL transcripts in Syrian CML patients and highlight their significance on monitoring and treatment protocols. Methods. CML patients positive for BCR-ABL transcripts by quantitative RT-PCR were enrolled. BCR-ABL transcript types were investigated using a home-made PCR method that was adapted from published protocols and optimized. The transcript types were then confirmed using a commercially available research kit. Results. Twenty-four transcripts were found in 21 patients. The most common was b2a2, followed by b3a2, b3a3, and e1a3 present solely in 12 (57.1%, 3 (14.3%, 2 (9.5%, and 1 (4.8%, respectively. Three samples (14.3% contained dual transcripts. While b3a2 transcript was apparently associated with warning molecular response to imatinib treatment, b2a2, b3a3, and e1a3 transcripts collectively proved otherwise (P=0.047. Conclusion. It might be advisable to identify the BCR-ABL transcript type in CML patients at diagnosis, using an empirically verified method, in order to link the detected transcript with the clinical findings, possible resistance to treatment, and appropriate monitoring methods.
The semantics of Chemical Markup Language (CML): dictionaries and conventions

Science.gov (United States)

2011-01-01

The semantic architecture of CML consists of conventions, dictionaries and units. The conventions conform to a top-level specification and each convention can constrain compliant documents through machine-processing (validation). Dictionaries conform to a dictionary specification which also imposes machine validation on the dictionaries. Each dictionary can also be used to validate data in a CML document, and provide human-readable descriptions. An additional set of conventions and dictionaries are used to support scientific units. All conventions, dictionaries and dictionary elements are identifiable and addressable through unique URIs. PMID:21999509
The semantics of Chemical Markup Language (CML): dictionaries and conventions.

Science.gov (United States)

Murray-Rust, Peter; Townsend, Joe A; Adams, Sam E; Phadungsukanan, Weerapong; Thomas, Jens

2011-10-14

The semantic architecture of CML consists of conventions, dictionaries and units. The conventions conform to a top-level specification and each convention can constrain compliant documents through machine-processing (validation). Dictionaries conform to a dictionary specification which also imposes machine validation on the dictionaries. Each dictionary can also be used to validate data in a CML document, and provide human-readable descriptions. An additional set of conventions and dictionaries are used to support scientific units. All conventions, dictionaries and dictionary elements are identifiable and addressable through unique URIs.

Binomial test models and item difficulty

NARCIS (Netherlands)

van der Linden, Willem J.

1979-01-01

In choosing a binomial test model, it is important to know exactly what conditions are imposed on item difficulty. In this paper these conditions are examined for both a deterministic and a stochastic conception of item responses. It appears that they are more restrictive than is generally
Evaluation of multielements in human serum of patients with chronic myelogenous leukemia (CML) using SRTXRF

International Nuclear Information System (INIS)

Leitao, Catarine Canellas Gondim

2005-04-01

In this work, trace elements were analyzed in serum of patients with chronic myelogenous leukemia (CML) by Total Reflection X-Ray Fluorescence using synchrotron radiation (SRTXRF). Chronic myelogenous leukemia (CML) affects the myeloid cells in the blood and affects 1 to 2 people per 100,000 and accounts for 7-20% cases of leukemia. Sixty patients with CML and sixty healthy volunteers (control group) were studied. Blood was collected into vacutainers without additives. Directly after collection, each blood sample was centrifuged at 3000 rev/min for 10 min in order to separate blood cells and suspended particles from blood serum. Sera were transferred into polyethylene tubes and stored in a freezer at 253 K. A 500 m u L serum quantity was spiked with Ga (50 m u L ) as internal standard. 10 m u L aliquots were pipetted on Perspex sample carrier. After deposition, the samples were left to dry under an infrared lamp. The measurements were performed at the X-Ray Fluorescence Beamline at Brazilian National Synchrotron Light Laboratory (LNLS), using a polychromatic beam. Standard solutions with gallium as internal standard were prepared for calibration system. It was possible to determine the concentrations of the following elements: P, S, Cl, K, Ca, Cr, Mn, Fe, Ni, Cu, Zn, Br and Rb. Starting from the ANOVA test was observed that the elements P, S, Ca, Cr, Mn, Fe, Cu and Rb presented real significant differences (α = 0.05) between groups (healthy subjects and CML patients) and Sex (males and females). (author)
Modeling Local Item Dependence in Cloze and Reading Comprehension Test Items Using Testlet Response Theory

Science.gov (United States)

Baghaei, Purya; Ravand, Hamdollah

2016-01-01

In this study the magnitudes of local dependence generated by cloze test items and reading comprehension items were compared and their impact on parameter estimates and test precision was investigated. An advanced English as a foreign language reading comprehension test containing three reading passages and a cloze test was analyzed with a…
Appearance and Disappearance of Chronic Myeloid Leukemia (CML) in Patient with Chronic Lymphocytic Leukemia (CLL)

OpenAIRE

Payandeh, Mehrdad; Sadeghi, Edris; Khodarahmi, Reza; Sadeghi, Masoud

2014-01-01

Chronic lymphocytic leukemia (CLL) and chronic myeloid leukemia (CML) are the most common leukemias of the elderly (>43 year). However, the sequential occurrence of CML followed by CLL in the same patient is extremely rare. In our report, a 52-year-old female was diagnosed with CLL (type of bone marrow (BM) infiltration was nodular and interstitial) and was treated with chlorambucil. 64 months after the diagnosis of CLL, she developed CML. She was treated with imatinib (400mg/day). After a fe...
Item Response Theory Models for Performance Decline during Testing

Science.gov (United States)

Jin, Kuan-Yu; Wang, Wen-Chung

2014-01-01

Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…
Management of CML in the Pediatric Age Group: Imatinib Mesylate or SCT.

Science.gov (United States)

El-Alfy, Mohsen S; Al-Haddad, Alaa M; Hamed, Ahmed A

2010-12-01

Management of CML has changed markedly since the introduction of tyrosine kinase inhibitors (TKIs). However stem cell transplantation (SCT) remains a valid therapeutic modality especially in developing countries due to its relatively lower cost. We aim to compare between imatinib mesylate and SCT as regard outcome in CML in the pediatric age group. Forty-eight patients with newly diagnosed CML in the chronic phase, aged 3 to 18 years were enrolled in this prospective study. Patients without a matched donor (Group I; N=30) were assigned to receive imatinib mesylate at a dose of 340mg÷m2÷day, while patients with a fully matched related donor (Group II; N=18), were offered SCT. Response (hematologic, cytogenetic and molecular), side effects and survival were analyzed. Complete hematologic response was obtained in 97% of the patients in group I and 94% in group II. Major cytogenetic response (CyR) was obtained in 80% of patients in group I and 100% in group II. Complete CyR was 57% in group I and 64% in group II. Major molecular response (MMR) was 36% in group I and 50% in group II with no significant difference between both groups. Six years overall survival (OS) was 87% in the 1st group and 61% in the 2nd group (pSCT group (55% had GVHD and 78% had infection). Imatinib mesylate has a superior OS and EFS than SCT in children. It is generally safe and well tolerated. Imatinib mesylate should be the 1st line treatment of pediatric patients with CML in the chronic phase. CML- Imatinib- SCT- Pediatrics.
Item response theory analysis of the mechanics baseline test

Science.gov (United States)

Cardamone, Caroline N.; Abbott, Jonathan E.; Rayyan, Saif; Seaton, Daniel T.; Pawl, Andrew; Pritchard, David E.

2012-02-01

Item response theory is useful in both the development and evaluation of assessments and in computing standardized measures of student performance. In item response theory, individual parameters (difficulty, discrimination) for each item or question are fit by item response models. These parameters provide a means for evaluating a test and offer a better measure of student skill than a raw test score, because each skill calculation considers not only the number of questions answered correctly, but the individual properties of all questions answered. Here, we present the results from an analysis of the Mechanics Baseline Test given at MIT during 2005-2010. Using the item parameters, we identify questions on the Mechanics Baseline Test that are not effective in discriminating between MIT students of different abilities. We show that a limited subset of the highest quality questions on the Mechanics Baseline Test returns accurate measures of student skill. We compare student skills as determined by item response theory to the more traditional measurement of the raw score and show that a comparable measure of learning gain can be computed.
Do endothelial cells belong to the primitive stem leukemic clone in CML? Role of extracellular vesicles.

Science.gov (United States)

Ramos, Teresa L; Sánchez-Abarca, Luis Ignacio; López-Ruano, Guillermo; Muntión, Sandra; Preciado, Silvia; Hernández-Ruano, Montserrat; Rosado, Belén; de las Heras, Natalia; Chillón, M Carmen; Hernández-Hernández, Ángel; González, Marcos; Sánchez-Guijo, Fermín; Del Cañizo, Consuelo

2015-08-01

The expression of BCR-ABL in hematopoietic stem cells is a well-defined primary event in chronic myeloid leukemia (CML). Some reports have described the presence of BCR-ABL on endothelial cells from CML patients, suggesting the origin of the disease in a primitive hemangioblastic cell. On the other hand, extracellular vesicles (EVs) released by CML leukemic cells are involved in the angiogenesis modulation process. In the current work we hypothesized that EVs released from BCR-ABL(+) cells may carry inside the oncogene that can be transferred to endothelial cells leading to the expression of both BCR-ABL transcript and the oncoprotein. EVs from K562 cells and plasma of newly diagnosed CML patients were isolated by ultracentrifugation. RT-PCR analysis detected the presence of BCR-ABL RNA in the EVs isolated from both K562 cells and plasma of CML patients. The incorporation of these EVs into endothelial cells was demonstrated by flow cytometry and fluorescence microscopy showed that after 24h of incubation most EVs were incorporated. BCR-ABL transcripts were detected in all experiments on endothelial cells incubated with EVs from both sources. The presence of BCR-ABL on endothelial cells incubated with Philadelphia(+) EVs was also confirmed by Western blot assays. In summary, endothelial cells acquire BCR-ABL RNA and the oncoprotein after incubation with EVs released from Ph(+) positive cells (either from K562 cells or from plasma of newly diagnosed CML patients). This results challenge the hypothesis that endothelial cells may be part of the Philadelphia(+) clone in CML. Copyright © 2015 Elsevier Ltd. All rights reserved.
Evaluating the Psychometric Characteristics of Generated Multiple-Choice Test Items

Science.gov (United States)

Gierl, Mark J.; Lai, Hollis; Pugh, Debra; Touchie, Claire; Boulais, André-Philippe; De Champlain, André

2016-01-01

Item development is a time- and resource-intensive process. Automatic item generation integrates cognitive modeling with computer technology to systematically generate test items. To date, however, items generated using cognitive modeling procedures have received limited use in operational testing situations. As a result, the psychometric…
Procedures for Selecting Items for Computerized Adaptive Tests.

Science.gov (United States)

Kingsbury, G. Gage; Zara, Anthony R.

1989-01-01

Several classical approaches and alternative approaches to item selection for computerized adaptive testing (CAT) are reviewed and compared. The study also describes procedures for constrained CAT that may be added to classical item selection approaches to allow them to be used for applied testing. (TJH)
Hoxa9 and Hoxa10 induce CML myeloid blast crisis development through activation of Myb expression.

Science.gov (United States)

Negi, Vijay; Vishwakarma, Bandana A; Chu, Su; Oakley, Kevin; Han, Yufen; Bhatia, Ravi; Du, Yang

2017-11-17

Mechanisms underlying the progression of Chronic Myeloid Leukemia (CML) from chronic phase to myeloid blast crisis are poorly understood. Our previous studies have suggested that overexpression of SETBP1 can drive this progression by conferring unlimited self-renewal capability to granulocyte macrophage progenitors (GMPs). Here we show that overexpression of Hoxa9 or Hoxa10 , both transcriptional targets of Setbp1 , is also sufficient to induce self-renewal of primary myeloid progenitors, causing their immortalization in culture. More importantly, both are able to cooperate with BCR/ABL to consistently induce transformation of mouse GMPs and development of aggressive leukemias resembling CML myeloid blast crisis, suggesting that either gene can drive CML progression by promoting the self-renewal of GMPs. We further identify Myb as a common critical target for Hoxa9 and Hoxa10 in inducing self-renewal of myeloid progenitors as Myb knockdown significantly reduced colony-forming potential of myeloid progenitors immortalized by the expression of either gene. Interestingly, Myb is also capable of immortalizing primary myeloid progenitors in culture and cooperating with BCR/ABL to induce leukemic transformation of mouse GMPs. Significantly increased levels of MYB transcript also were detected in all human CML blast crisis samples examined over chronic phase samples, further suggesting the possibility that MYB overexpression may play a prevalent role in driving human CML myeloid blast crisis development. In summary, our results identify overexpression of HOXA9 , HOXA10 , and MYB as critical drivers of CML progression, and suggest MYB as a key therapeutic target for inhibiting the self-renewal of leukemia-initiating cells in CML myeloid blast crisis patients.
Effect of Differential Item Functioning on Test Equating

Science.gov (United States)

Kabasakal, Kübra Atalay; Kelecioglu, Hülya

2015-01-01

This study examines the effect of differential item functioning (DIF) items on test equating through multilevel item response models (MIRMs) and traditional IRMs. The performances of three different equating models were investigated under 24 different simulation conditions, and the variables whose effects were examined included sample size, test…
The Role of Item Feedback in Self-Adapted Testing.

Science.gov (United States)

Roos, Linda L.; And Others

1997-01-01

The importance of item feedback in self-adapted testing was studied by comparing feedback and no feedback conditions for computerized adaptive tests and self-adapted tests taken by 363 college students. Results indicate that item feedback is not necessary to realize score differences between self-adapted and computerized adaptive testing. (SLD)
Australian Chemistry Test Item Bank: Years 11 & 12. Volume 1.

Science.gov (United States)

Commons, C., Ed.; Martin, P., Ed.

Volume 1 of the Australian Chemistry Test Item Bank, consisting of two volumes, contains nearly 2000 multiple-choice items related to the chemistry taught in Year 11 and Year 12 courses in Australia. Items which were written during 1979 and 1980 were initially published in the "ACER Chemistry Test Item Collection" and in the "ACER…
Algorithms for computerized test construction using classical item parameters

NARCIS (Netherlands)

Adema, Jos J.; van der Linden, Willem J.

1989-01-01

Recently, linear programming models for test construction were developed. These models were based on the information function from item response theory. In this paper another approach is followed. Two 0-1 linear programming models for the construction of tests using classical item and test
Droplet Digital PCR for BCR/ABL(P210) Detecting of CML: A High Sensitive Method of the Minimal Residual Disease& Disease Progression.

Science.gov (United States)

Wang, Wen-Jun; Zheng, Chao-Feng; Liu, Zhuang; Tan, Yan-Hong; Chen, Xiu-Hua; Zhao, Bin-Liang; Li, Guo-Xia; Xu, Zhi-Fang; Ren, Fang-Gang; Zhang, Yao-Fang; Chang, Jian-Mei; Wang, Hong-Wei

2018-04-25

The present study intended to establish a droplet digital PCR (dd-PCR) for monitoring minimal residual disease (MRD) in patients with BCR/ABL (P210)-positive CML, thereby achieving deep-level monitoring of tumor load and determining the efficacy for guided clinically individualized treatment. Using dd-PCR and RT-qPCR, two cell suspensions were obtained from K562 cells and normal peripheral blood mononuclear cells by gradient dilution and were measured at the cellular level. At peripheral blood(PB) level, 61 cases with CML-chronic phase (CML-CP) were obtained after tyrosine kinase inhibitors (TKIs) treatment and regular follow-ups. By RT-qPCR, BCR/ABL (P210) fusion gene was undetectable in PB after three successive analyses, which were performed once every three months. At the same time, dd-PCR was performed simultaneously with the last equal amount of cDNA. Ten CML patients with MR4.5 were followed up by the two methods. At the cellular level, consistency of results of dd-PCR and RT-qPCR reached R 2 ≥0.99, with conversion equation of Y=33.148X 1.222 (Y: dd-PCR results; X: RT-qPCR results). In the dd-PCR test, 11 of the 61 CML patients (18.03%) tested positive and showed statistically significant difference (PPCR 3 months earlier than by RT-qPCR. In contrast with RT-qPCR, dd-PCR is more sensitive, thus enabling accurate conversion of dd-PCR results into internationally standard RT-qPCR results by conversion equation, to achieve a deeper molecular biology-based stratification of BCR/ABL(P210) MRD. It has some reference value to monitor disease progression in clinic. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Detection of differential item functioning using Lagrange multiplier tests

NARCIS (Netherlands)

Glas, Cornelis A.W.

1996-01-01

In this paper it is shown that differential item functioning can be evaluated using the Lagrange multiplier test or C. R. Rao's efficient score test. The test is presented in the framework of a number of item response theory (IRT) models such as the Rasch model, the one-parameter logistic model, the
Diagnosis and Treatment of Chronic Myeloid Leukemia (CML) in 2015

Science.gov (United States)

Thompson, Philip A; Kantarjian, Hagop; Cortes, Jorge E

2017-01-01

Few neoplastic diseases have undergone a transformation in a relatively short period of time like chronic myeloid leukemia (CML) has in the last few years. In 1960, CML was the first cancer where a unique chromosomal abnormality, “a minute chromosome”,1 was identified and a pathophysiologic correlation suggested. Landmark work followed, recognizing the underlying translocation between chromosomes 9 and 22 that gave rise to this abnormality2 and shortly afterward, the specific genes involved3,4 and the pathophysiologic implications of this novel rearrangement.5–7 Fast-forward a few years, this knowledge has given us the most remarkable example of a specific therapy targeting the dysregulated kinase activity represented by this molecular change. The broad use of tyrosine kinase inhibitors has resulted in an improvement in the overall survival to the point where the life expectancy of patients today is nearly equal to that of the general population.8 Still, there are challenges and unanswered questions that define the reasons why the progress still escapes many patients, and the details that separate patients from ultimate “cure”. In this manuscript we review our current understanding of CML in 2015, present recommendations for optimal management, and discuss the unanswered questions and what could be done to answer them in the near future. PMID:26434969
ACER Chemistry Test Item Collection (ACER CHEMTIC Year 12 Supplement).

Science.gov (United States)

Australian Council for Educational Research, Hawthorn.

This publication contains 317 multiple-choice chemistry test items related to topics covered in the Victorian (Australia) Year 12 chemistry course. It allows teachers access to a range of items suitable for diagnostic and achievement purposes, supplementing the ACER Chemistry Test Item Collection--Year 12 (CHEMTIC). The topics covered are: organic…
Mixed-Format Test Score Equating: Effect of Item-Type Multidimensionality, Length and Composition of Common-Item Set, and Group Ability Difference

Science.gov (United States)

Wang, Wei

2013-01-01

Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are now widely used in many testing programs. Mixed-format tests often are considered to be superior to tests containing only MC items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under…

Australian Chemistry Test Item Bank: Years 11 and 12. Volume 2.

Science.gov (United States)

Commons, C., Ed.; Martin, P., Ed.

The second volume of the Australian Chemistry Test Item Bank, consisting of two volumes, contains nearly 2000 multiple-choice items related to the chemistry taught in Year 11 and Year 12 courses in Australia. Items which were written during 1979 and 1980 were initially published in the "ACER Chemistry Test Item Collection" and in the…
Differential item functioning analysis of the Vanderbilt Expertise Test for cars.

Science.gov (United States)

Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W; Van Gulick, Ana Beth; Gauthier, Isabel

2015-01-01

The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge.
Bayes Factor Covariance Testing in Item Response Models.

Science.gov (United States)

Fox, Jean-Paul; Mulder, Joris; Sinharay, Sandip

2017-12-01

Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning the underlying covariance structure are evaluated using (fractional) Bayes factor tests. The support for a unidimensional factor (i.e., assumption of local independence) and differential item functioning are evaluated by testing the covariance components. The posterior distribution of common covariance components is obtained in closed form by transforming latent responses with an orthogonal (Helmert) matrix. This posterior distribution is defined as a shifted-inverse-gamma, thereby introducing a default prior and a balanced prior distribution. Based on that, an MCMC algorithm is described to estimate all model parameters and to compute (fractional) Bayes factor tests. Simulation studies are used to show that the (fractional) Bayes factor tests have good properties for testing the underlying covariance structure of binary response data. The method is illustrated with two real data studies.
Mathematical-programming approaches to test item pool design

NARCIS (Netherlands)

Veldkamp, Bernard P.; van der Linden, Willem J.; Ariel, A.

2002-01-01

This paper presents an approach to item pool design that has the potential to improve on the quality of current item pools in educational and psychological testing andhence to increase both measurement precision and validity. The approach consists of the application of mathematical programming
Uncertainties in the Item Parameter Estimates and Robust Automated Test Assembly

Science.gov (United States)

Veldkamp, Bernard P.; Matteucci, Mariagiulia; de Jong, Martijn G.

2013-01-01

Item response theory parameters have to be estimated, and because of the estimation process, they do have uncertainty in them. In most large-scale testing programs, the parameters are stored in item banks, and automated test assembly algorithms are applied to assemble operational test forms. These algorithms treat item parameters as fixed values,…
The "Sniffin' Kids" test--a 14-item odor identification test for children.

Directory of Open Access Journals (Sweden)

Valentin A Schriever

Full Text Available Tools for measuring olfactory function in adults have been well established. Although studies have shown that olfactory impairment in children may occur as a consequence of a number of diseases or head trauma, until today no consensus on how to evaluate the sense of smell in children exists in Europe. Aim of the study was to develop a modified "Sniffin' Sticks" odor identification test, the "Sniffin' Kids" test for the use in children. In this study 537 children between 6-17 years of age were included. Fourteen odors, which were identified at a high rate by children, were selected from the "Sniffin' Sticks" 16-item odor identification test. Normative date for the 14-item "Sniffin' Kids" odor identification test was obtained. The test was validated by including a group of congenital anosmic children. Results show that the "Sniffin' Kids" test is able to discriminate between normosmia and anosmia with a cutoff value of >7 points on the odor identification test. In addition the test-retest reliability was investigated in a group of 31 healthy children and shown to be ρ = 0.44. With the 14-item odor identification "Sniffin' Kids" test we present a valid and reliable test for measuring olfactory function in children between ages 6-17 years.
A person fit test for IRT models for polytomous items

NARCIS (Netherlands)

Glas, Cornelis A.W.; Dagohoy, A.V.

2007-01-01

A person fit test based on the Lagrange multiplier test is presented for three item response theory models for polytomous items: the generalized partial credit model, the sequential model, and the graded response model. The test can also be used in the framework of multidimensional ability
Disrupting BCR-ABL in combination with secondary leukemia-specific pathways in CML cells leads to enhanced apoptosis and decreased proliferation.

Science.gov (United States)

Woessner, David W; Lim, Carol S

2013-01-07

Chronic myeloid leukemia (CML) is a myeloproliferative disorder caused by expression of the fusion gene BCR-ABL following a chromosomal translocation in the hematopoietic stem cell. Therapeutic management of CML uses tyrosine kinase inhibitors (TKIs), which block ABL-signaling and effectively kill peripheral cells with BCR-ABL. However, TKIs are not curative, and chronic use is required in order to treat CML. The primary failure for TKIs is through the development of a resistant population due to mutations in the TKI binding regions. This led us to develop the mutant coiled-coil, CC(mut2), an alternative method for BCR-ABL signaling inhibition by targeting the N-terminal oligomerization domain of BCR, necessary for ABL activation. In this article, we explore additional pathways that are important for leukemic stem cell survival in K562 cells. Using a candidate-based approach, we test the combination of CC(mut2) and inhibitors of unique secondary pathways in leukemic cells. Transformative potential was reduced following silencing of the leukemic stem cell factor Alox5 by RNA interference. Furthermore, blockade of the oncogenic protein MUC-1 by the novel peptide GO-201 yielded reductions in proliferation and increased cell death. Finally, we found that inhibiting macroautophagy using chloroquine in addition to blocking BCR-ABL signaling with the CC(mut2) was most effective in limiting cell survival and proliferation. This study has elucidated possible combination therapies for CML using novel blockade of BCR-ABL and secondary leukemia-specific pathways.
A comparison of discriminant logistic regression and Item Response Theory Likelihood-Ratio Tests for Differential Item Functioning (IRTLRDIF) in polytomous short tests.

Science.gov (United States)

Hidalgo, María D; López-Martínez, María D; Gómez-Benito, Juana; Guilera, Georgina

2016-01-01

Short scales are typically used in the social, behavioural and health sciences. This is relevant since test length can influence whether items showing DIF are correctly flagged. This paper compares the relative effectiveness of discriminant logistic regression (DLR) and IRTLRDIF for detecting DIF in polytomous short tests. A simulation study was designed. Test length, sample size, DIF amount and item response categories number were manipulated. Type I error and power were evaluated. IRTLRDIF and DLR yielded Type I error rates close to nominal level in no-DIF conditions. Under DIF conditions, Type I error rates were affected by test length DIF amount, degree of test contamination, sample size and number of item response categories. DLR showed a higher Type I error rate than did IRTLRDIF. Power rates were affected by DIF amount and sample size, but not by test length. DLR achieved higher power rates than did IRTLRDIF in very short tests, although the high Type I error rate involved means that this result cannot be taken into account. Test length had an important impact on the Type I error rate. IRTLRDIF and DLR showed a low power rate in short tests and with small sample sizes.
Evaluating an Automated Number Series Item Generator Using Linear Logistic Test Models

Directory of Open Access Journals (Sweden)

Bao Sheng Loe

2018-04-01

Full Text Available This study investigates the item properties of a newly developed Automatic Number Series Item Generator (ANSIG. The foundation of the ANSIG is based on five hypothesised cognitive operators. Thirteen item models were developed using the numGen R package and eleven were evaluated in this study. The 16-item ICAR (International Cognitive Ability Resource1 short form ability test was used to evaluate construct validity. The Rasch Model and two Linear Logistic Test Model(s (LLTM were employed to estimate and predict the item parameters. Results indicate that a single factor determines the performance on tests composed of items generated by the ANSIG. Under the LLTM approach, all the cognitive operators were significant predictors of item difficulty. Moderate to high correlations were evident between the number series items and the ICAR test scores, with high correlation found for the ICAR Letter-Numeric-Series type items, suggesting adequate nomothetic span. Extended cognitive research is, nevertheless, essential for the automatic generation of an item pool with predictable psychometric properties.
Criteria for eliminating items of a Test of Figural Analogies

Directory of Open Access Journals (Sweden)

Diego Blum

2013-12-01

Full Text Available This paper describes the steps taken to eliminate two of the items in a Test of Figural Analogies (TFA. The main guidelines of psychometric analysis concerning Classical Test Theory (CTT and Item Response Theory (IRT are explained. The item elimination process was based on both the study of the CTT difficulty and discrimination index, and the unidimensionality analysis. The a, b, and c parameters of the Three Parameter Logistic Model of IRT were also considered for this purpose, as well as the assessment of each item fitting this model. The unfavourable characteristics of a group of TFA items are detailed, and decisions leading to their possible elimination are discussed.
Analysis test of understanding of vectors with the three-parameter logistic model of item response theory and item response curves technique

Science.gov (United States)

Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

2016-12-01

This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.
An Effect Size Measure for Raju's Differential Functioning for Items and Tests

Science.gov (United States)

Wright, Keith D.; Oshima, T. C.

2015-01-01

This study established an effect size measure for differential functioning for items and tests' noncompensatory differential item functioning (NCDIF). The Mantel-Haenszel parameter served as the benchmark for developing NCDIF's effect size measure for reporting moderate and large differential item functioning in test items. The effect size of…
Computerized adaptive testing item selection in computerized adaptive learning systems

NARCIS (Netherlands)

Eggen, Theodorus Johannes Hendrikus Maria; Eggen, T.J.H.M.; Veldkamp, B.P.

2012-01-01

Item selection methods traditionally developed for computerized adaptive testing (CAT) are explored for their usefulness in item-based computerized adaptive learning (CAL) systems. While in CAT Fisher information-based selection is optimal, for recovering learning populations in CAL systems item
Coupled map lattice (CML) approach to power reactor dynamics (I) - preservation of normality

International Nuclear Information System (INIS)

Konno, H.

1996-01-01

An application of coupled map lattice (CML) model for simulating power fluctuations in nuclear power reactors is presented. (1) Preservation of Gaussianity in the point model is studied in a chaotic force driven Langevin equation in conjunction with the Gaussian-white noise driven Langevin equation. (2) Preservation of Guassianity is also studied in the space-dependent model with the use of a CML model near the onset of the Hopf bifurcation point. It is shown that the spatial dimensionality decreases as the maximum eigenvalue of the system increases. The result is consistent with the observation of neutron fluctuation in a BWR. (author)
Analysis test of understanding of vectors with the three-parameter logistic model of item response theory and item response curves technique

Directory of Open Access Journals (Sweden)

Suttida Rakkapao

2016-10-01

Full Text Available This study investigated the multiple-choice test of understanding of vectors (TUV, by applying item response theory (IRT. The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test’s distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.
Prostaglandin E1 and Its Analog Misoprostol Inhibit Human CML Stem Cell Self-Renewal via EP4 Receptor Activation and Repression of AP-1.

Science.gov (United States)

Li, Fengyin; He, Bing; Ma, Xiaoke; Yu, Shuyang; Bhave, Rupali R; Lentz, Steven R; Tan, Kai; Guzman, Monica L; Zhao, Chen; Xue, Hai-Hui

2017-09-07

Effective treatment of chronic myelogenous leukemia (CML) largely depends on the eradication of CML leukemic stem cells (LSCs). We recently showed that CML LSCs depend on Tcf1 and Lef1 factors for self-renewal. Using a connectivity map, we identified prostaglandin E1 (PGE1) as a small molecule that partly elicited the gene expression changes in LSCs caused by Tcf1/Lef1 deficiency. Although it has little impact on normal hematopoiesis, we found that PGE1 treatment impaired the persistence and activity of LSCs in a pre-clinical murine CML model and a xenograft model of transplanted CML patient CD34 + stem/progenitor cells. Mechanistically, PGE1 acted on the EP4 receptor and repressed Fosb and Fos AP-1 factors in a β-catenin-independent manner. Misoprostol, an FDA-approved EP4 agonist, conferred similar protection against CML. These findings suggest that activation of this PGE1-EP4 pathway specifically targets CML LSCs and that the combination of PGE1/misoprostol with conventional tyrosine-kinase inhibitors could provide effective therapy for CML. Copyright © 2017 Elsevier Inc. All rights reserved.
A review of the effects on IRT item parameter estimates with a focus on misbehaving common items in test equating

Directory of Open Access Journals (Sweden)

Michalis P Michaelides

2010-10-01

Full Text Available Many studies have investigated the topic of change or drift in item parameter estimates in the context of Item Response Theory. Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items.
A Review of the Effects on IRT Item Parameter Estimates with a Focus on Misbehaving Common Items in Test Equating.

Science.gov (United States)

Michaelides, Michalis P

2010-01-01

Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items.
A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs

Directory of Open Access Journals (Sweden)

Eren Halil ÖZBERK

2017-03-01

Full Text Available In contrast with the previous studies, this study employed various test designs (simple and complex which allow the evaluation of the overall ability score estimations across multiple real test conditions. In this study, four factors were manipulated, namely the test design, number of items per dimension, correlation between dimensions and item selection methods. Using the generated item and ability parameters, dichotomous item responses were generated in by using M3PL compensatory multidimensional IRT model with specified correlations. MCAT composite ability score accuracy was evaluated using absolute bias (ABSBIAS, correlation and the root mean square error (RMSE between true and estimated ability scores. The results suggest that the multidimensional test structure, number of item per dimension and correlation between dimensions had significant effect on item selection methods for the overall score estimations. For simple structure test design it was found that V1 item selection has the lowest absolute bias estimations for both long and short tests while estimating overall scores. As the model gets complex KL item selection method performed better than other two item selection method.

Item response theory analysis of the life orientation test-revised: age and gender differential item functioning analyses.

Science.gov (United States)

Steca, Patrizia; Monzani, Dario; Greco, Andrea; Chiesi, Francesca; Primi, Caterina

2015-06-01

This study is aimed at testing the measurement properties of the Life Orientation Test-Revised (LOT-R) for the assessment of dispositional optimism by employing item response theory (IRT) analyses. The LOT-R was administered to a large sample of 2,862 Italian adults. First, confirmatory factor analyses demonstrated the theoretical conceptualization of the construct measured by the LOT-R as a single bipolar dimension. Subsequently, IRT analyses for polytomous, ordered response category data were applied to investigate the items' properties. The equivalence of the items across gender and age was assessed by analyzing differential item functioning. Discrimination and severity parameters indicated that all items were able to distinguish people with different levels of optimism and adequately covered the spectrum of the latent trait. Additionally, the LOT-R appears to be gender invariant and, with minor exceptions, age invariant. Results provided evidence that the LOT-R is a reliable and valid measure of dispositional optimism. © The Author(s) 2014.
Disrupting BCR-ABL in Combination with Secondary Leukemia-Specific Pathways in CML Cells Leads to Enhanced Apoptosis and Decreased Proliferation

OpenAIRE

Woessner, David W.; Lim, Carol S.

2012-01-01

Chronic myeloid leukemia (CML) is a myeloproliferative disorder caused by expression of the fusion gene BCR-ABL following a chromosomal translocation in the hematopoietic stem cell.1 Therapeutic management of CML uses tyrosine kinase inhibitors (TKIs), which blocks ABL-signaling and effectively kill peripheral cells with BCR-ABL. However, TKIs are not curative, and chronic use of is required in order to treat CML. The primary failure for TKIs is through development of a resistant population d...
Detection of person misfit in computerized adaptive tests with polytomous items

NARCIS (Netherlands)

van Krimpen-Stoop, Edith; Meijer, R.R.

2000-01-01

Item scores that do not fit an assumed item response theory model may cause the latent trait value to be estimated inaccurately. For computerized adaptive tests (CAT) with dichotomous items, several person-fit statistics for detecting nonfitting item score patterns have been proposed. Both for
The semantics of Chemical Markup Language (CML for computational chemistry : CompChem

Directory of Open Access Journals (Sweden)

Phadungsukanan Weerapong

2012-08-01

Full Text Available Abstract This paper introduces a subdomain chemistry format for storing computational chemistry data called CompChem. It has been developed based on the design, concepts and methodologies of Chemical Markup Language (CML by adding computational chemistry semantics on top of the CML Schema. The format allows a wide range of ab initio quantum chemistry calculations of individual molecules to be stored. These calculations include, for example, single point energy calculation, molecular geometry optimization, and vibrational frequency analysis. The paper also describes the supporting infrastructure, such as processing software, dictionaries, validation tools and database repositories. In addition, some of the challenges and difficulties in developing common computational chemistry dictionaries are discussed. The uses of CompChem are illustrated by two practical applications.
The semantics of Chemical Markup Language (CML) for computational chemistry : CompChem.

Science.gov (United States)

Phadungsukanan, Weerapong; Kraft, Markus; Townsend, Joe A; Murray-Rust, Peter

2012-08-07

: This paper introduces a subdomain chemistry format for storing computational chemistry data called CompChem. It has been developed based on the design, concepts and methodologies of Chemical Markup Language (CML) by adding computational chemistry semantics on top of the CML Schema. The format allows a wide range of ab initio quantum chemistry calculations of individual molecules to be stored. These calculations include, for example, single point energy calculation, molecular geometry optimization, and vibrational frequency analysis. The paper also describes the supporting infrastructure, such as processing software, dictionaries, validation tools and database repositories. In addition, some of the challenges and difficulties in developing common computational chemistry dictionaries are discussed. The uses of CompChem are illustrated by two practical applications.
Review of clinical, cytogenetic, and molecular aspects of Ph-negative CML

NARCIS (Netherlands)

D. van der Plas (D.); G.C. Grosveld (Gerard); A. Hagemeijer (Anne)

1991-01-01

markdownabstractAbstract Between 1985 and 1989, many cases of Philadelphia (Ph) chromosome negative chronic myelogenous leukemia (CML) were reported. For this review, the following selection criteria were used: the original articles on Ph-negative cases should provide clinical, hematologic,
Item Response Theory Modeling of the Philadelphia Naming Test

Science.gov (United States)

Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D.

2015-01-01

Purpose: In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating…
Differential Weighting of Items to Improve University Admission Test Validity

Directory of Open Access Journals (Sweden)

Eduardo Backhoff Escudero

2001-05-01

Full Text Available This paper gives an evaluation of different ways to increase university admission test criterion-related validity, by differentially weighting test items. We compared four methods of weighting multiple-choice items of the Basic Skills and Knowledge Examination (EXHCOBA: (1 punishing incorrect responses by a constant factor, (2 weighting incorrect responses, considering the levels of error, (3 weighting correct responses, considering the item’s difficulty, based on the Classic Measurement Theory, and (4 weighting correct responses, considering the item’s difficulty, based on the Item Response Theory. Results show that none of these methods increased the instrument’s predictive validity, although they did improve its concurrent validity. It was concluded that it is appropriate to score the test by simply adding up correct responses.
Acadesine kills chronic myelogenous leukemia (CML cells through PKC-dependent induction of autophagic cell death.

Directory of Open Access Journals (Sweden)

Guillaume Robert

Full Text Available CML is an hematopoietic stem cell disease characterized by the t(9;22 (q34;q11 translocation encoding the oncoprotein p210BCR-ABL. The effect of acadesine (AICAR, 5-Aminoimidazole-4-carboxamide-1-beta-D-ribofuranoside a compound with known antileukemic effect on B cell chronic lymphoblastic leukemia (B-CLL was investigated in different CML cell lines. Acadesine triggered loss of cell metabolism in K562, LAMA-84 and JURL-MK1 and was also effective in killing imatinib-resistant K562 cells and Ba/F3 cells carrying the T315I-BCR-ABL mutation. The anti-leukemic effect of acadesine did not involve apoptosis but required rather induction of autophagic cell death. AMPK knock-down by Sh-RNA failed to prevent the effect of acadesine, indicating an AMPK-independent mechanism. The effect of acadesine was abrogated by GF109203X and Ro-32-0432, both inhibitor of classical and new PKCs and accordingly, acadesine triggered relocation and activation of several PKC isoforms in K562 cells. In addition, this compound exhibited a potent anti-leukemic effect in clonogenic assays of CML cells in methyl cellulose and in a xenograft model of K562 cells in nude mice. In conclusion, our work identifies an original and unexpected mechanism by which acadesine triggers autophagic cell death through PKC activation. Therefore, in addition to its promising effects in B-CLL, acadesine might also be beneficial for Imatinib-resistant CML patients.
SPARC expression in CML is associated to imatinib treatment and to inhibition of leukemia cell proliferation

Directory of Open Access Journals (Sweden)

Giallongo Cesarina

2013-02-01

Full Text Available Abstract Background SPARC is a matricellular glycoprotein with growth-inhibitory and antiangiogenic activity in some cell types. The study of this protein in hematopoietic malignancies led to conflicting reports about its role as a tumor suppressor or promoter, depending on its different functions in the tumor microenvironment. In this study we investigated the variations in SPARC production by peripheral blood cells from chronic myeloid leukemia (CML patients at diagnosis and after treatment and we identified the subpopulation of cells that are the prevalent source of SPARC. Methods We evaluated SPARC expression using real-time PCR and western blotting. SPARC serum levels were detected by ELISA assay. Finally we analyzed the interaction between exogenous SPARC and imatinib (IM, in vitro, using ATP-lite and cell cycle analysis. Results Our study shows that the CML cells of patients at diagnosis have a low mRNA and protein expression of SPARC. Low serum levels of this protein are also recorded in CML patients at diagnosis. However, after IM treatment we observed an increase of SPARC mRNA, protein, and serum level in the peripheral blood of these patients that had already started at 3 months and was maintained for at least the 18 months of observation. This SPARC increase was predominantly due to monocyte production. In addition, exogenous SPARC protein reduced the growth of K562 cell line and synergized in vitro with IM by inhibiting cell cycle progression from G1 to S phase. Conclusion Our results suggest that low endogenous SPARC expression is a constant feature of BCR/ABL positive cells and that IM treatment induces SPARC overproduction by normal cells. This exogenous SPARC may inhibit CML cell proliferation and may synergize with IM activity against CML.
SPARC expression in CML is associated to imatinib treatment and to inhibition of leukemia cell proliferation

International Nuclear Information System (INIS)

Giallongo, Cesarina; Palumbo, Giuseppe A; Di Raimondo, Francesco; La Cava, Piera; Tibullo, Daniele; Barbagallo, Ignazio; Parrinello, Nunziatina; Cupri, Alessandra; Stagno, Fabio; Consoli, Carla; Chiarenza, Annalisa

2013-01-01

SPARC is a matricellular glycoprotein with growth-inhibitory and antiangiogenic activity in some cell types. The study of this protein in hematopoietic malignancies led to conflicting reports about its role as a tumor suppressor or promoter, depending on its different functions in the tumor microenvironment. In this study we investigated the variations in SPARC production by peripheral blood cells from chronic myeloid leukemia (CML) patients at diagnosis and after treatment and we identified the subpopulation of cells that are the prevalent source of SPARC. We evaluated SPARC expression using real-time PCR and western blotting. SPARC serum levels were detected by ELISA assay. Finally we analyzed the interaction between exogenous SPARC and imatinib (IM), in vitro, using ATP-lite and cell cycle analysis. Our study shows that the CML cells of patients at diagnosis have a low mRNA and protein expression of SPARC. Low serum levels of this protein are also recorded in CML patients at diagnosis. However, after IM treatment we observed an increase of SPARC mRNA, protein, and serum level in the peripheral blood of these patients that had already started at 3 months and was maintained for at least the 18 months of observation. This SPARC increase was predominantly due to monocyte production. In addition, exogenous SPARC protein reduced the growth of K562 cell line and synergized in vitro with IM by inhibiting cell cycle progression from G1 to S phase. Our results suggest that low endogenous SPARC expression is a constant feature of BCR/ABL positive cells and that IM treatment induces SPARC overproduction by normal cells. This exogenous SPARC may inhibit CML cell proliferation and may synergize with IM activity against CML
Effects of Differential Item Functioning on Examinees' Test Performance and Reliability of Test

Science.gov (United States)

Lee, Yi-Hsuan; Zhang, Jinming

2017-01-01

Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…
Science Literacy: How do High School Students Solve PISA Test Items?

Science.gov (United States)

Wati, F.; Sinaga, P.; Priyandoko, D.

2017-09-01

The Programme for International Students Assessment (PISA) does assess students’ science literacy in a real-life contexts and wide variety of situation. Therefore, the results do not provide adequate information for the teacher to excavate students’ science literacy because the range of materials taught at schools depends on the curriculum used. This study aims to investigate the way how junior high school students in Indonesia solve PISA test items. Data was collected by using PISA test items in greenhouse unit employed to 36 students of 9th grade. Students’ answer was analyzed qualitatively for each item based on competence tested in the problem. The way how students answer the problem exhibits their ability in particular competence which is influenced by a number of factors. Those are students’ unfamiliarity with test construction, low performance on reading, low in connecting available information and question, and limitation on expressing their ideas effectively and easy-read. As the effort, selected PISA test items can be used in accordance teaching topic taught to familiarize students with science literacy.
Design and prototyping of real-time systems using CSP and CML

DEFF Research Database (Denmark)

Rischel, Hans; Sun, Hong Yan

1997-01-01

A procedure for systematic design of event based systems is introduced by means of the Production Cell case study. The design is documented by CSP style processes, which allow both verification using formal techniques and also validation of a rapid prototype in the functional language CML...
Bayesian item selection criteria for adaptive testing

NARCIS (Netherlands)

van der Linden, Willem J.

1996-01-01

R.J. Owen (1975) proposed an approximate empirical Bayes procedure for item selection in adaptive testing. The procedure replaces the true posterior by a normal approximation with closed-form expressions for its first two moments. This approximation was necessary to minimize the computational
Item validity vs. item discrimination index: a redundancy?

Science.gov (United States)

Panjaitan, R. L.; Irawati, R.; Sujana, A.; Hanifah, N.; Djuanda, D.

2018-03-01

In several literatures about evaluation and test analysis, it is common to find that there are calculations of item validity as well as item discrimination index (D) with different formula for each. Meanwhile, other resources said that item discrimination index could be obtained by calculating the correlation between the testee’s score in a particular item and the testee’s score on the overall test, which is actually the same concept as item validity. Some research reports, especially undergraduate theses tend to include both item validity and item discrimination index in the instrument analysis. It seems that these concepts might overlap for both reflect the test quality on measuring the examinees’ ability. In this paper, examples of some results of data processing on item validity and item discrimination index were compared. It would be discussed whether item validity and item discrimination index can be represented by one of them only or it should be better to present both calculations for simple test analysis, especially in undergraduate theses where test analyses were included.
Statistical Indexes for Monitoring Item Behavior under Computer Adaptive Testing Environment.

Science.gov (United States)

Zhu, Renbang; Yu, Feng; Liu, Su

A computerized adaptive test (CAT) administration usually requires a large supply of items with accurately estimated psychometric properties, such as item response theory (IRT) parameter estimates, to ensure the precision of examinee ability estimation. However, an estimated IRT model of a given item in any given pool does not always correctly…
Item Response Theory Analyses of the Cambridge Face Memory Test (CFMT)

Science.gov (United States)

Cho, Sun-Joo; Wilmer, Jeremy; Herzmann, Grit; McGugin, Rankin; Fiset, Daniel; Van Gulick, Ana E.; Ryan, Katie; Gauthier, Isabel

2014-01-01

We evaluated the psychometric properties of the Cambridge face memory test (CFMT; Duchaine & Nakayama, 2006). First, we assessed the dimensionality of the test with a bi-factor exploratory factor analysis (EFA). This EFA analysis revealed a general factor and three specific factors clustered by targets of CFMT. However, the three specific factors appeared to be minor factors that can be ignored. Second, we fit a unidimensional item response model. This item response model showed that the CFMT items could discriminate individuals at different ability levels and covered a wide range of the ability continuum. We found the CFMT to be particularly precise for a wide range of ability levels. Third, we implemented item response theory (IRT) differential item functioning (DIF) analyses for each gender group and two age groups (Age ≤ 20 versus Age > 21). This DIF analysis suggested little evidence of consequential differential functioning on the CFMT for these groups, supporting the use of the test to compare older to younger, or male to female, individuals. Fourth, we tested for a gender difference on the latent facial recognition ability with an explanatory item response model. We found a significant but small gender difference on the latent ability for face recognition, which was higher for women than men by 0.184, at age mean 23.2, controlling for linear and quadratic age effects. Finally, we discuss the practical considerations of the use of total scores versus IRT scale scores in applications of the CFMT. PMID:25642930
“Preleukemic or smoldering” chronic myelogenous leukemia (CML:BCR-ABL1 positive: A brief case report

Directory of Open Access Journals (Sweden)

John M. Bennett

2015-01-01

The most common feature of CML is an elevated WBC count, usually above 25×103/µL, and frequently above 100×103/µL. We report a case of confirmed Ph+CML with a normal CBC detected because of the presence of rare myelocytes and 2% basophils [Fig. 1]. Previous leukocyte counts for the preceding eight years were normal with the exception of one done four months prior to his presentation that showed an abnormal differential with 1% basophils, 2% metamyelocytes and 2% myelocytes.
Chloramphenicol Biosynthesis: The Structure of CmlS, a Flavin-Dependent Halogenase Shwing a Covalent Flavin-Aspartate Bond

International Nuclear Information System (INIS)

Podzelinska, K.; Latimer, R.; Bhattacharya, A.; Vining, L.; Zechel, D.; Jia, Z.

2010-01-01

Chloramphenicol is a halogenated natural product bearing an unusual dichloroacetyl moiety that is critical for its antibiotic activity. The operon for chloramphenicol biosynthesis in Streptomyces venezuelae encodes the chloramphenicol halogenase CmlS, which belongs to the large and diverse family of flavin-dependent halogenases (FDH's). CmlS was previously shown to be essential for the formation of the dichloroacetyl group. Here we report the X-ray crystal structure of CmlS determined at 2.2 (angstrom) resolution, revealing a flavin monooxygenase domain shared by all FDHs, but also a unique 'winged-helix' C-terminal domain that creates a T-shaped tunnel leading to the halogenation active site. Intriguingly, the C-terminal tail of this domain blocks access to the halogenation active site, suggesting a structurally dynamic role during catalysis. The halogenation active site is notably nonpolar and shares nearly identical residues with Chondromyces crocatus tyrosyl halogenase (CndH), including the conserved Lys (K71) that forms the reactive chloramine intermediate. The exception is Y350, which could be used to stabilize enolate formation during substrate halogenation. The strictly conserved residue E44, located near the isoalloxazine ring of the bound flavin adenine dinucleotide (FAD) cofactor, is optimally positioned to function as a remote general acid, through a water-mediated proton relay, which could accelerate the reaction of the chloramine intermediate during substrate halogenation, or the oxidation of chloride by the FAD(C4α)-OOH intermediate. Strikingly, the 8α carbon of the FAD cofactor is observed to be covalently attached to D277 of CmlS, a residue that is highly conserved in the FDH family. In addition to representing a new type of flavin modification, this has intriguing implications for the mechanism of FDHs. Based on the crystal structure and in analogy to known halogenases, we propose a reaction mechanism for CmlS.

Item Response Theory analysis of Fagerström Test for Cigarette Dependence.

Science.gov (United States)

Svicher, Andrea; Cosci, Fiammetta; Giannini, Marco; Pistelli, Francesco; Fagerström, Karl

2018-02-01

The Fagerström Test for Cigarette Dependence (FTCD) and the Heaviness of Smoking Index (HSI) are the gold standard measures to assess cigarette dependence. However, FTCD reliability and factor structure have been questioned and HSI psychometric properties are in need of further investigations. The present study examined the psychometrics properties of the FTCD and the HSI via the Item Response Theory. The study was a secondary analysis of data collected in 862 Italian daily smokers. Confirmatory factor analysis was run to evaluate the dimensionality of FTCD. A Grade Response Model was applied to FTCD and HSI to verify the fit to the data. Both item and test functioning were analyzed and item statistics, Test Information Function, and scale reliabilities were calculated. Mokken Scale Analysis was applied to estimate homogeneity and Loevinger's coefficients were calculated. The FTCD showed unidimensionality and homogeneity for most of the items and for the total score. It also showed high sensitivity and good reliability from medium to high levels of cigarette dependence, although problems related to some items (i.e., items 3 and 5) were evident. HSI had good homogeneity, adequate item functioning, and high reliability from medium to high levels of cigarette dependence. Significant Differential Item Functioning was found for items 1, 4, 5 of the FTCD and for both items of HSI. HSI seems highly recommended in clinical settings addressed to heavy smokers while FTCD would be better used in smokers with a level of cigarette dependence ranging between low and high. Copyright © 2017 Elsevier Ltd. All rights reserved.
Development of abbreviated eight-item form of the Penn Verbal Reasoning Test.

Science.gov (United States)

Bilker, Warren B; Wierzbicki, Michael R; Brensinger, Colleen M; Gur, Raquel E; Gur, Ruben C

2014-12-01

The ability to reason with language is a highly valued cognitive capacity that correlates with IQ measures and is sensitive to damage in language areas. The Penn Verbal Reasoning Test (PVRT) is a 29-item computerized test for measuring abstract analogical reasoning abilities using language. The full test can take over half an hour to administer, which limits its applicability in large-scale studies. We previously described a procedure for abbreviating a clinical rating scale and a modified procedure for reducing tests with a large number of items. Here we describe the application of the modified method to reducing the number of items in the PVRT to a parsimonious subset of items that accurately predicts the total score. As in our previous reduction studies, a split sample is used for model fitting and validation, with cross-validation to verify results. We find that an 8-item scale predicts the total 29-item score well, achieving a correlation of .9145 for the reduced form for the model fitting sample and .8952 for the validation sample. The results indicate that a drastically abbreviated version, which cuts administration time by more than 70%, can be safely administered as a predictor of PVRT performance. © The Author(s) 2014.
Development of Abbreviated Eight-Item Form of the Penn Verbal Reasoning Test

Science.gov (United States)

Bilker, Warren B.; Wierzbicki, Michael R.; Brensinger, Colleen M.; Gur, Raquel E.; Gur, Ruben C.

2014-01-01

The ability to reason with language is a highly valued cognitive capacity that correlates with IQ measures and is sensitive to damage in language areas. The Penn Verbal Reasoning Test (PVRT) is a 29-item computerized test for measuring abstract analogical reasoning abilities using language. The full test can take over half an hour to administer, which limits its applicability in large-scale studies. We previously described a procedure for abbreviating a clinical rating scale and a modified procedure for reducing tests with a large number of items. Here we describe the application of the modified method to reducing the number of items in the PVRT to a parsimonious subset of items that accurately predicts the total score. As in our previous reduction studies, a split sample is used for model fitting and validation, with cross-validation to verify results. We find that an 8-item scale predicts the total 29-item score well, achieving a correlation of .9145 for the reduced form for the model fitting sample and .8952 for the validation sample. The results indicate that a drastically abbreviated version, which cuts administration time by more than 70%, can be safely administered as a predictor of PVRT performance. PMID:24577310
Application of Item Response Theory to Tests of Substance-related Associative Memory

Science.gov (United States)

Shono, Yusuke; Grenard, Jerry L.; Ames, Susan L.; Stacy, Alan W.

2015-01-01

A substance-related word association test (WAT) is one of the commonly used indirect tests of substance-related implicit associative memory and has been shown to predict substance use. This study applied an item response theory (IRT) modeling approach to evaluate psychometric properties of the alcohol- and marijuana-related WATs and their items among 775 ethnically diverse at-risk adolescents. After examining the IRT assumptions, item fit, and differential item functioning (DIF) across gender and age groups, the original 18 WAT items were reduced to 14- and 15-items in the alcohol- and marijuana-related WAT, respectively. Thereafter, unidimensional one- and two-parameter logistic models (1PL and 2PL models) were fitted to the revised WAT items. The results demonstrated that both alcohol- and marijuana-related WATs have good psychometric properties. These results were discussed in light of the framework of a unified concept of construct validity (Messick, 1975, 1989, 1995). PMID:25134051
A Preliminary Study of the Suitability of Archival Bone Marrow and Peripheral Blood Smears for Diagnosis of CML Using FISH

Directory of Open Access Journals (Sweden)

Alice Charwudzi

2014-01-01

Full Text Available Background. FISH is a molecular cytogenetic technique enabling rapid detection of genetic abnormalities. Facilities that can run fresh/wet samples for molecular diagnosis and monitoring of neoplastic disorders are not readily available in Ghana and other neighbouring countries. This study aims to demonstrate that interphase FISH can successfully be applied to archival methanol-fixed bone marrow and peripheral blood smear slides transported to a more equipped facility for molecular diagnosis of CML. Methods. Interphase FISH was performed on 22 archival methanol-fixed marrow (BM and 3 peripheral blood (PB smear slides obtained at diagnosis. The BM smears included 20 CML and 2 CMML cases diagnosed by morphology; the 3 PB smears were from 3 of the CML patients at the time of diagnosis. Six cases had known BCR-ABL fusion results at diagnosis by RQ-PCR. Full blood count reports at diagnosis were also retrieved. Result. 19 (95% of the CML marrow smears demonstrated the BCR-ABL translocation. There was a significant correlation between the BCR-ABL transcript detected at diagnosis by RQ-PCR and that retrospectively detected by FISH from the aged BM smears at diagnosis (r=0.870; P=0.035. Conclusion. Archival methanol-fixed marrow and peripheral blood smears can be used to detect the BCR-ABL transcript for CML diagnosis.
The Efficacy of Reduced-dose Dasatinib as a Subsequent Therapy in Patients with Chronic Myeloid Leukemia in the Chronic Phase: The LD-CML Study of the Kanto CML Study Group

Science.gov (United States)

Iriyama, Noriyoshi; Ohashi, Kazuteru; Hashino, Satoshi; Kimura, Shinya; Nakaseko, Chiaki; Takano, Hina; Hino, Masayuki; Uchiyama, Michihiro; Morita, Satoshi; Sakamoto, Junichi; Sakamaki, Hisashi; Inokuchi, Koiti

2017-01-01

Objective The aim of this study was to prospectively investigate the efficacy and safety profiles of low-dose dasatinib therapy (50 mg once daily). Methods Patients with chronic myeloid leukemia in the chronic phase (CML-CP) who were being treated with low-dose imatinib (≤200 mg/day), but were resistant to this agent were enrolled in the current study (referred to as the LD-CML study). Results There subjects included 9 patients (4 men and 5 women); all were treated with dasatinib at a dose of 50 mg once daily. Among 8 patients who had not experienced major molecular response (MMR; BCR-ABL1 transcript ≤0.1% according to International Scale [IS]) at study enrollment, 5 attained MMR by 12 months. In particular, 3 of 9 patients demonstrated a deep molecular response (DMR; IS ≤0.0069%) by 18 months. Five patients developed lymphocytosis accompanied by cytotoxic lymphocyte predominance. There was no mortality or disease progression, and all continue to receive dasatinib therapy at 18 months with only 2 patients requiring dose reduction. Toxicities were mild-to-moderate, and pleural effusion was observed in 1 patient (grade 1). Conclusion Low-dose dasatinib can attain MMR and DMR without severe toxicity in patients with CML-CP who are unable to achieve MMR with low-dose imatinib. Switching to low-dose dasatinib should therefore be considered for patients in this setting, especially if they are otherwise considering a cessation of treatment. PMID:29033428
Group differences in the heritability of items and test scores

NARCIS (Netherlands)

Wicherts, J.M.; Johnson, W.

2009-01-01

It is important to understand potential sources of group differences in the heritability of intelligence test scores. On the basis of a basic item response model we argue that heritabilities which are based on dichotomous item scores normally do not generalize from one sample to the next. If groups
Assessing Differential Item Functioning on the Test of Relational Reasoning

Directory of Open Access Journals (Sweden)

Denis Dumas

2018-03-01

Full Text Available The test of relational reasoning (TORR is designed to assess the ability to identify complex patterns within visuospatial stimuli. The TORR is designed for use in school and university settings, and therefore, its measurement invariance across diverse groups is critical. In this investigation, a large sample, representative of a major university on key demographic variables, was collected, and the resulting data were analyzed using a multi-group, multidimensional item-response theory model-comparison procedure. No significant differential item functioning was found on any of the TORR items across any of the demographic groups of interest. This finding is interpreted as evidence of the cultural fairness of the TORR, and potential test-development choices that may have contributed to that cultural fairness are discussed.
Development of a lack of appetite item bank for computer-adaptive testing (CAT)

DEFF Research Database (Denmark)

Thamsborg, Lise Laurberg Holst; Petersen, Morten Aa; Aaronson, Neil K

2015-01-01

to 12 lack of appetite items. CONCLUSIONS: Phases 1-3 resulted in 12 lack of appetite candidate items. Based on a field testing (phase 4), the psychometric characteristics of the items will be assessed and the final item bank will be generated. This CAT item bank is expected to provide precise...
Comparison of three different LCIA methods: EDIP97, CML2001 and Eco-indicator 99. Does it matter which one you choose

DEFF Research Database (Denmark)

Dreyer, Louise Camilla; Niemann, Anne Louise; Hauschild, Michael Zwicky

2003-01-01

?’ To investigate this issue, a comparison is performed of three frequently applied life cycle impact assessment methods. Methods. The three life cycle impact assessment methods EDIP97 (1), CML2001 (2) and Eco-indicator 99 (3) are compared on their performance through application to the same life cycle inventory...... of the EDIP97 and CML2001 output, differences up to two orders of magnitude are found for some of the indicator results for the impact categories describing toxicity to humans and ecosystems, and there is little similarity in the patterns of major contributors among the two methods. For human toxicity the CML......2001 score is dominated by contribution from metals while the EDIP97 score is caused by a solvent and nitrogen oxides. For aquatic ecotoxicity, metals are the main contributors for both methods but while it is vanadium for CML2001, it is strontium for EDIP97. After normalisation, the differences...
3CML: a software application for quality control of multi leaf collimators; 3CML: una aplicacion informatica para el control de calidad de colimadores multilaminas

Energy Technology Data Exchange (ETDEWEB)

Miras, H.; Perez, M. A.; Macias, J.; Moreno, J. C.; Campo, J. L.; Ortiz, M.; Arrans, R.; Ortiz, A.; Terron, J. A.; Fernandez, D.

2011-07-01

The treatments of intensity modulated radiotherapy (IMRT) require a deep knowledge of the accuracy, precision and reproducibility of positioning of the plates that make up the multi leaf collimator (MLC). We have developed a computer application, 3CML, to analyze an image corresponding to a pattern of separate bands irradiation to determine the deviations of the positioning of the blades on the nominal values.
Algorithmic test design using classical item parameters

NARCIS (Netherlands)

van der Linden, Willem J.; Adema, Jos J.

Two optimalization models for the construction of tests with a maximal value of coefficient alpha are given. Both models have a linear form and can be solved by using a branch-and-bound algorithm. The first model assumes an item bank calibrated under the Rasch model and can be used, for instance,
Detection of differential item functioning using Lagrange multiplier tests

NARCIS (Netherlands)

Glas, Cornelis A.W.

1998-01-01

Abstract: In the present paper it is shown that differential item functioning can be evaluated using the Lagrange multiplier test or Rao’s efficient score test. The test is presented in the framework of a number of IRT models such as the Rasch model, the OPLM, the 2-parameter logistic model, the
HLA restriction of non-HLA-A, -B, -C and -D cell mediated lympholysis (CML)

International Nuclear Information System (INIS)

Goulmy, E.; Termijtelen, A.; Bradley, B.A.; Rood, J.J. van

1976-01-01

The aim of our study was to define target determinations other than those coded for by the classical HLA-A, -B, -C or -D loci which were responsible for killing in CML. In one of the families studied, strong evidence was found for the existence of a determinant coded for within the HLA region. CML was restricted to targets carrying the classical HLA-Bw35 and Cw4 determinants but the targets were neither HLA-Bw35 nor Cw4 themselves. We therefore concluded that this new HLA determinant was either the product of a new locus closely associated with HLA-B or that it was a product of the classical HLA-B locus which has not been recognized by serology. (author)
Effects of Using Modified Items to Test Students with Persistent Academic Difficulties

Science.gov (United States)

Elliott, Stephen N.; Kettler, Ryan J.; Beddow, Peter A.; Kurz, Alexander; Compton, Elizabeth; McGrath, Dawn; Bruen, Charles; Hinton, Kent; Palmer, Porter; Rodriguez, Michael C.; Bolt, Daniel; Roach, Andrew T.

2010-01-01

This study investigated the effects of using modified items in achievement tests to enhance accessibility. An experiment determined whether tests composed of modified items would reduce the performance gap between students eligible for an alternate assessment based on modified achievement standards (AA-MAS) and students not eligible, and the…
Overview of Classical Test Theory and Item Response Theory for Quantitative Assessment of Items in Developing Patient-Reported Outcome Measures

Science.gov (United States)

Cappelleri, Joseph C.; Lundy, J. Jason; Hays, Ron D.

2014-01-01

Introduction The U.S. Food and Drug Administration’s patient-reported outcome (PRO) guidance document defines content validity as “the extent to which the instrument measures the concept of interest” (FDA, 2009, p. 12). “Construct validity is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity” (Strauss & Smith, 2009, p. 7). Hence both qualitative and quantitative information are essential in evaluating the validity of measures. Methods We review classical test theory and item response theory approaches to evaluating PRO measures including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized “difficulty” (severity) order of items is represented by observed responses. Conclusion Classical test theory and item response theory can be useful in providing a quantitative assessment of items and scales during the content validity phase of patient-reported outcome measures. Depending on the particular type of measure and the specific circumstances, either one or both approaches should be considered to help maximize the content validity of PRO measures. PMID:24811753
Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures.

Science.gov (United States)

Cappelleri, Joseph C; Jason Lundy, J; Hays, Ron D

2014-05-01

The US Food and Drug Administration's guidance for industry document on patient-reported outcomes (PRO) defines content validity as "the extent to which the instrument measures the concept of interest" (FDA, 2009, p. 12). According to Strauss and Smith (2009), construct validity "is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity" (p. 7). Hence, both qualitative and quantitative information are essential in evaluating the validity of measures. We review classical test theory and item response theory (IRT) approaches to evaluating PRO measures, including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized "difficulty" (severity) order of items is represented by observed responses. If a researcher has few qualitative data and wants to get preliminary information about the content validity of the instrument, then descriptive assessments using classical test theory should be the first step. As the sample size grows during subsequent stages of instrument development, confidence in the numerical estimates from Rasch and other IRT models (as well as those of classical test theory) would also grow. Classical test theory and IRT can be useful in providing a quantitative assessment of items and scales during the content-validity phase of PRO-measure development. Depending on the particular type of measure and the specific circumstances, the classical test theory and/or the IRT should be considered to help maximize the content validity of PRO measures. Copyright © 2014 Elsevier HS Journals, Inc. All rights reserved.
An Explanatory Item Response Theory Approach for a Computer-Based Case Simulation Test

Science.gov (United States)

Kahraman, Nilüfer

2014-01-01

Problem: Practitioners working with multiple-choice tests have long utilized Item Response Theory (IRT) models to evaluate the performance of test items for quality assurance. The use of similar applications for performance tests, however, is often encumbered due to the challenges encountered in working with complicated data sets in which local…
Differential Item Functioning (DIF) among Spanish-Speaking English Language Learners (ELLs) in State Science Tests

Science.gov (United States)

Ilich, Maria O.

Psychometricians and test developers evaluate standardized tests for potential bias against groups of test-takers by using differential item functioning (DIF). English language learners (ELLs) are a diverse group of students whose native language is not English. While they are still learning the English language, they must take their standardized tests for their school subjects, including science, in English. In this study, linguistic complexity was examined as a possible source of DIF that may result in test scores that confound science knowledge with a lack of English proficiency among ELLs. Two years of fifth-grade state science tests were analyzed for evidence of DIF using two DIF methods, Simultaneous Item Bias Test (SIBTest) and logistic regression. The tests presented a unique challenge in that the test items were grouped together into testlets---groups of items referring to a scientific scenario to measure knowledge of different science content or skills. Very large samples of 10, 256 students in 2006 and 13,571 students in 2007 were examined. Half of each sample was composed of Spanish-speaking ELLs; the balance was comprised of native English speakers. The two DIF methods were in agreement about the items that favored non-ELLs and the items that favored ELLs. Logistic regression effect sizes were all negligible, while SIBTest flagged items with low to high DIF. A decrease in socioeconomic status and Spanish-speaking ELL diversity may have led to inconsistent SIBTest effect sizes for items used in both testing years. The DIF results for the testlets suggested that ELLs lacked sufficient opportunity to learn science content. The DIF results further suggest that those constructed response test items requiring the student to draw a conclusion about a scientific investigation or to plan a new investigation tended to favor ELLs.
Item response theory, computerized adaptive testing, and PROMIS: assessment of physical function.

Science.gov (United States)

Fries, James F; Witter, James; Rose, Matthias; Cella, David; Khanna, Dinesh; Morgan-DeWitt, Esi

2014-01-01

Patient-reported outcome (PRO) questionnaires record health information directly from research participants because observers may not accurately represent the patient perspective. Patient-reported Outcomes Measurement Information System (PROMIS) is a US National Institutes of Health cooperative group charged with bringing PRO to a new level of precision and standardization across diseases by item development and use of item response theory (IRT). With IRT methods, improved items are calibrated on an underlying concept to form an item bank for a "domain" such as physical function (PF). The most informative items can be combined to construct efficient "instruments" such as 10-item or 20-item PF static forms. Each item is calibrated on the basis of the probability that a given person will respond at a given level, and the ability of the item to discriminate people from one another. Tailored forms may cover any desired level of the domain being measured. Computerized adaptive testing (CAT) selects the best items to sharpen the estimate of a person's functional ability, based on prior responses to earlier questions. PROMIS item banks have been improved with experience from several thousand items, and are calibrated on over 21,000 respondents. In areas tested to date, PROMIS PF instruments are superior or equal to Health Assessment Questionnaire and Medical Outcome Study Short Form-36 Survey legacy instruments in clarity, translatability, patient importance, reliability, and sensitivity to change. Precise measures, such as PROMIS, efficiently incorporate patient self-report of health into research, potentially reducing research cost by lowering sample size requirements. The advent of routine IRT applications has the potential to transform PRO measurement.

Hydroxychavicol, a Piper betle leaf component, induces apoptosis of CML cells through mitochondrial reactive oxygen species-dependent JNK and endothelial nitric oxide synthase activation and overrides imatinib resistance.

Science.gov (United States)

Chakraborty, Jayashree B; Mahato, Sanjit K; Joshi, Kalpana; Shinde, Vaibhav; Rakshit, Srabanti; Biswas, Nabendu; Choudhury Mukherjee, Indrani; Mandal, Labanya; Ganguly, Dipyaman; Chowdhury, Avik A; Chaudhuri, Jaydeep; Paul, Kausik; Pal, Bikas C; Vinayagam, Jayaraman; Pal, Churala; Manna, Anirban; Jaisankar, Parasuraman; Chaudhuri, Utpal; Konar, Aditya; Roy, Siddhartha; Bandyopadhyay, Santu

2012-01-01

Alcoholic extract of Piper betle (Piper betle L.) leaves was recently found to induce apoptosis of CML cells expressing wild type and mutated Bcr-Abl with imatinib resistance phenotype. Hydroxy-chavicol (HCH), a constituent of the alcoholic extract of Piper betle leaves, was evaluated for anti-CML activity. Here, we report that HCH and its analogues induce killing of primary cells in CML patients and leukemic cell lines expressing wild type and mutated Bcr-Abl, including the T315I mutation, with minimal toxicity to normal human peripheral blood mononuclear cells. HCH causes early but transient increase of mitochondria-derived reactive oxygen species. Reactive oxygen species-dependent persistent activation of JNK leads to an increase in endothelial nitric oxide synthase-mediated nitric oxide generation. This causes loss of mitochondrial membrane potential, release of cytochrome c from mitochondria, cleavage of caspase 9, 3 and poly-adenosine diphosphate-ribose polymerase leading to apoptosis. One HCH analogue was also effective in vivo in SCID mice against grafts expressing the T315I mutation, although to a lesser extent than grafts expressing wild type Bcr-Abl, without showing significant bodyweight loss. Our data describe the role of JNK-dependent endothelial nitric oxide synthase-mediated nitric oxide for anti-CML activity of HCH and this molecule merits further testing in pre-clinical and clinical settings. © 2011 Japanese Cancer Association.
Science Library of Test Items. Volume Eight. Mastery Testing Program. Series 3 & 4 Supplements to Introduction and Manual.

Science.gov (United States)

New South Wales Dept. of Education, Sydney (Australia).

Continuing a series of short tests aimed at measuring student mastery of specific skills in the natural sciences, this supplementary volume includes teachers' notes, a users' guide and inspection copies of test items 27 to 50. Answer keys and test scoring statistics are provided. The items are designed for grades 7 through 10, and a list of the…
Cyclopiamines C and D: Epoxide Spiroindolinone Alkaloids from Penicillium sp. CML 3020

DEFF Research Database (Denmark)

Kildgaard, Sara; de Medeiros, Lívia S; Phillips, Emma

2018-01-01

Cyclopiamines C (1) and D (2) were isolated from the extract of Penicillium sp. CML 3020, a fungus sourced from an Atlantic Forest soil sample. Their structures and relative configuration were determined by 1D and 2D NMR, HRMS, and UV/vis data analysis. Cyclopiamines C and D belong to a small...
An empirical comparison of Item Response Theory and Classical Test Theory

Directory of Open Access Journals (Sweden)

Špela Progar

2008-11-01

Full Text Available Based on nonlinear models between the measured latent variable and the item response, item response theory (IRT enables independent estimation of item and person parameters and local estimation of measurement error. These properties of IRT are also the main theoretical advantages of IRT over classical test theory (CTT. Empirical evidence, however, often failed to discover consistent differences between IRT and CTT parameters and between invariance measures of CTT and IRT parameter estimates. In this empirical study a real data set from the Third International Mathematics and Science Study (TIMSS 1995 was used to address the following questions: (1 How comparable are CTT and IRT based item and person parameters? (2 How invariant are CTT and IRT based item parameters across different participant groups? (3 How invariant are CTT and IRT based item and person parameters across different item sets? The findings indicate that the CTT and the IRT item/person parameters are very comparable, that the CTT and the IRT item parameters show similar invariance property when estimated across different groups of participants, that the IRT person parameters are more invariant across different item sets, and that the CTT item parameters are at least as much invariant in different item sets as the IRT item parameters. The results furthermore demonstrate that, with regards to the invariance property, IRT item/person parameters are in general empirically superior to CTT parameters, but only if the appropriate IRT model is used for modelling the data.
Stochastic order in dichotomous item response models for fixed tests, research adaptive tests, or multiple abilities

NARCIS (Netherlands)

van der Linden, Willem J.

1995-01-01

Dichotomous item response theory (IRT) models can be viewed as families of stochastically ordered distributions of responses to test items. This paper explores several properties of such distributiom. The focus is on the conditions under which stochastic order in families of conditional
The Role of Item Models in Automatic Item Generation

Science.gov (United States)

Gierl, Mark J.; Lai, Hollis

2012-01-01

Automatic item generation represents a relatively new but rapidly evolving research area where cognitive and psychometric theories are used to produce tests that include items generated using computer technology. Automatic item generation requires two steps. First, test development specialists create item models, which are comparable to templates…
Overcoming the effects of differential skewness of test items in scale construction

Directory of Open Access Journals (Sweden)

Johann M. Schepers

2004-10-01

Full Text Available The principal objective of the study was to develop a procedure for overcoming the effects of differential skewness of test items in scale construction. It was shown that the degree of skewness of test items places an upper limit on the correlations between the items, regardless of the contents of the items. If the items are ordered in terms of skewness the resulting inter correlation matrix forms a simplex or a pseudo simplex. Factoring such a matrix results in a multiplicity of factors, most of which are artifacts. A procedure for overcoming this problem was demonstrated with items from the Locus of Control Inventory (Schepers, 1995. The analysis was based on a sample of 1662 first year university students. Opsomming Die hoofdoel van die studie was om ’n prosedure te ontwikkel om die gevolge van differensiële skeefheid van toetsitems, in skaalkonstruksie, teen te werk. Daar is getoon dat die graad van skeefheid van toetsitems ’n boonste grens plaas op die korrelasies tussen die items ongeag die inhoud daarvan. Indien die items gerangskik word volgens graad van skeefheid, sal die interkorrelasiematriks van die items ’n simpleks of pseudosimpleks vorm. Indien so ’n matriks aan faktorontleding onderwerp word, lei dit tot ’n veelheid van faktore waarvan die meerderheid artefakte is. ’n Prosedure om hierdie probleem te bowe te kom, is gedemonstreer met behulp van die items van die Lokus van Beheer-vraelys (Schepers, 1995. Die ontledings is op ’n steekproef van 1662 eerstejaaruniversiteitstudente gebaseer.
Item difficulty of multiple choice tests dependant on different item response formats – An experiment in fundamental research on psychological assessment

Directory of Open Access Journals (Sweden)

KLAUS D. KUBINGER

2007-12-01

Full Text Available Multiple choice response formats are problematical as an item is often scored as solved simply because the test-taker is a lucky guesser. Instead of applying pertinent IRT models which take guessing effects into account, a pragmatic approach of re-conceptualizing multiple choice response formats to reduce the chance of lucky guessing is considered. This paper compares the free response format with two different multiple choice formats. A common multiple choice format with a single correct response option and five distractors (“1 of 6” is used, as well as a multiple choice format with five response options, of which any number of the five is correct and the item is only scored as mastered if all the correct response options and none of the wrong ones are marked (“x of 5”. An experiment was designed, using pairs of items with exactly the same content but different response formats. 173 test-takers were randomly assigned to two test booklets of 150 items altogether. Rasch model analyses adduced a fitting item pool, after the deletion of 39 items. The resulting item difficulty parameters were used for the comparison of the different formats. The multiple choice format “1 of 6” differs significantly from “x of 5”, with a relative effect of 1.63, while the multiple choice format “x of 5” does not significantly differ from the free response format. Therefore, the lower degree of difficulty of items with the “1 of 6” multiple choice format is an indicator of relevant guessing effects. In contrast the “x of 5” multiple choice format can be seen as an appropriate substitute for free response format.
International development of an EORTC questionnaire for assessing health-related quality of life in chronic myeloid leukemia patients : The EORTC QLQ-CML24

NARCIS (Netherlands)

Efficace, Fabio; Baccarani, Michele; Breccia, Massimo; Saussele, Susanne; Abel, Gregory; Caocci, Giovanni; Guilhot, Francois; Cocks, Kim; Naeem, Adel; Sprangers, Mirjam; Oerlemans, Simone; Chie, Weichu; Castagnetti, Fausto; Bombaci, Felice; Sharf, Giora; Cardoni, Annarita; Noens, Lucien; Pallua, Stephan; Salvucci, Marzia; Nicolatou-Galitis, Ourania; Rosti, Gianantonio; Mandelli, Franco

Background Health-related quality of life (HRQOL) is a key aspect for chronic myeloid leukemia (CML) patients. The aim of this study was to develop a disease-specific HRQOL questionnaire for patients with CML to supplement the European Organization for Research and Treatment of Cancer (EORTC)-QLQ
International development of an EORTC questionnaire for assessing health-related quality of life in chronic myeloid leukemia patients: the EORTC QLQ-CML24

NARCIS (Netherlands)

Efficace, Fabio; Baccarani, Michele; Breccia, Massimo; Saussele, Susanne; Abel, Gregory; Caocci, Giovanni; Guilhot, Francois; Cocks, Kim; Naeem, Adel; Sprangers, Mirjam; Oerlemans, Simone; Chie, Weichu; Castagnetti, Fausto; Bombaci, Felice; Sharf, Giora; Cardoni, Annarita; Noens, Lucien; Pallua, Stephan; Salvucci, Marzia; Nicolatou-Galitis, Ourania; Rosti, Gianantonio; Mandelli, Franco

2014-01-01

Background Health-related quality of life (HRQOL) is a key aspect for chronic myeloid leukemia (CML) patients. The aim of this study was to develop a disease-specific HRQOL questionnaire for patients with CML to supplement the European Organization for Research and Treatment of Cancer (EORTC)-QLQ
Latent Trait Theory Applications to Test Item Bias Methodology. Research Memorandum No. 1.

Science.gov (United States)

Osterlind, Steven J.; Martois, John S.

This study discusses latent trait theory applications to test item bias methodology. A real data set is used in describing the rationale and application of the Rasch probabilistic model item calibrations across various ethnic group populations. A high school graduation proficiency test covering reading comprehension, writing mechanics, and…
Fostering a student's skill for analyzing test items through an authentic task

Science.gov (United States)

Setiawan, Beni; Sabtiawan, Wahyu Budi

2017-08-01

Analyzing test items is a skill that must be mastered by prospective teachers, in order to determine the quality of test questions which have been written. The main aim of this research was to describe the effectiveness of authentic task to foster the student's skill for analyzing test items involving validity, reliability, item discrimination index, level of difficulty, and distractor functioning through the authentic task. The participant of the research is students of science education study program, science and mathematics faculty, Universitas Negeri Surabaya, enrolled for assessment course. The research design was a one-group posttest design. The treatment in this study is that the students were provided an authentic task facilitating the students to develop test items, then they analyze the items like a professional assessor using Microsoft Excel and Anates Software. The data of research obtained were analyzed descriptively, such as the analysis was presented by displaying the data of students' skill, then they were associated with theories or previous empirical studies. The research showed the task facilitated the students to have the skills. Thirty-one students got a perfect score for the analyzing, five students achieved 97% mastery, two students had 92% mastery, and another two students got 89% and 79% of mastery. The implication of the finding was the students who get authentic tasks forcing them to perform like a professional, the possibility of the students for achieving the professional skills will be higher at the end of learning.
The quadratic relationship between difficulty of intelligence test items and their correlations with working memory

Directory of Open Access Journals (Sweden)

Tomasz eSmoleń

2015-08-01

Full Text Available Fluid intelligence (Gf is a crucial cognitive ability that involves abstract reasoning in order to solve novel problems. Recent research demonstrated that Gf strongly depends on the individual effectiveness of working memory (WM. We investigated a popular claim that if the storage capacity underlay the WM-Gf correlation, then such a correlation should increase with an increasing number of items or rules (load in a Gf test. As often no such link is observed, on that basis the storage-capacity account is rejected, and alternative accounts of Gf (e.g., related to executive control or processing speed are proposed. Using both analytical inference and numerical simulations, we demonstrated that the load-dependent change in correlation is primarily a function of the amount of floor/ceiling effect for particular items. Thus, the item-wise WM correlation of a Gf test depends on its overall difficulty, and the difficulty distribution across its items. When the early test items yield huge ceiling, but the late items do not approach floor, that correlation will increase throughout the test. If the early items locate themselves between ceiling and floor, but the late items approach floor, the respective correlation will decrease. For a hallmark Gf test, the Raven test, whose items span from ceiling to floor, the quadratic relationship is expected, and it was shown empirically using a large sample and two types of WMC tasks. In consequence, no changes in correlation due to varying WM/Gf load, or lack of them, can yield an argument for or against any theory of WM/Gf. Moreover, as the mathematical properties of the correlation formula make it relatively immune to ceiling/floor effects for overall moderate correlations, only minor changes (if any in the WM-Gf correlation should be expected for many psychological tests.
The quadratic relationship between difficulty of intelligence test items and their correlations with working memory.

Science.gov (United States)

Smolen, Tomasz; Chuderski, Adam

2015-01-01

Fluid intelligence (Gf) is a crucial cognitive ability that involves abstract reasoning in order to solve novel problems. Recent research demonstrated that Gf strongly depends on the individual effectiveness of working memory (WM). We investigated a popular claim that if the storage capacity underlay the WM-Gf correlation, then such a correlation should increase with an increasing number of items or rules (load) in a Gf-test. As often no such link is observed, on that basis the storage-capacity account is rejected, and alternative accounts of Gf (e.g., related to executive control or processing speed) are proposed. Using both analytical inference and numerical simulations, we demonstrated that the load-dependent change in correlation is primarily a function of the amount of floor/ceiling effect for particular items. Thus, the item-wise WM correlation of a Gf-test depends on its overall difficulty, and the difficulty distribution across its items. When the early test items yield huge ceiling, but the late items do not approach floor, that correlation will increase throughout the test. If the early items locate themselves between ceiling and floor, but the late items approach floor, the respective correlation will decrease. For a hallmark Gf-test, the Raven-test, whose items span from ceiling to floor, the quadratic relationship is expected, and it was shown empirically using a large sample and two types of WMC tasks. In consequence, no changes in correlation due to varying WM/Gf load, or lack of them, can yield an argument for or against any theory of WM/Gf. Moreover, as the mathematical properties of the correlation formula make it relatively immune to ceiling/floor effects for overall moderate correlations, only minor changes (if any) in the WM-Gf correlation should be expected for many psychological tests.
Optimizing the Use of Response Times for Item Selection in Computerized Adaptive Testing

Science.gov (United States)

Choe, Edison M.; Kern, Justin L.; Chang, Hua-Hua

2018-01-01

Despite common operationalization, measurement efficiency of computerized adaptive testing should not only be assessed in terms of the number of items administered but also the time it takes to complete the test. To this end, a recent study introduced a novel item selection criterion that maximizes Fisher information per unit of expected response…
Arabidopsis calmodulin-like protein CML36 is a calcium (Ca2+) sensor that interacts with the plasma membrane Ca2+-ATPase isoform ACA8 and stimulates its activity.

Science.gov (United States)

Astegno, Alessandra; Bonza, Maria Cristina; Vallone, Rosario; La Verde, Valentina; D'Onofrio, Mariapina; Luoni, Laura; Molesini, Barbara; Dominici, Paola

2017-09-08

Calmodulin-like (CML) proteins are major EF-hand-containing, calcium (Ca 2+ )-binding proteins with crucial roles in plant development and in coordinating plant stress tolerance. Given their abundance in plants, the properties of Ca 2+ sensors and identification of novel target proteins of CMLs deserve special attention. To this end, we recombinantly produced and biochemically characterized CML36 from Arabidopsis thaliana We analyzed Ca 2+ and Mg 2+ binding to the individual EF-hands, observed metal-induced conformational changes, and identified a physiologically relevant target. CML36 possesses two high-affinity Ca 2+ /Mg 2+ mixed binding sites and two low-affinity Ca 2+ -specific sites. Binding of Ca 2+ induced an increase in the α-helical content and a conformational change that lead to the exposure of hydrophobic regions responsible for target protein recognition. Cation binding, either Ca 2+ or Mg 2+ , stabilized the secondary and tertiary structures of CML36, guiding a large structural transition from a molten globule apo-state to a compact holoconformation. Importantly, through in vitro binding and activity assays, we showed that CML36 interacts directly with the regulative N terminus of the Arabidopsis plasma membrane Ca 2+ -ATPase isoform 8 (ACA8) and that this interaction stimulates ACA8 activity. Gene expression analysis revealed that CML36 and ACA8 are co-expressed mainly in inflorescences. Collectively, our results support a role for CML36 as a Ca 2+ sensor that binds to and modulates ACA8, uncovering a possible involvement of the CML protein family in the modulation of plant-autoinhibited Ca 2+ pumps. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
Two cases of chronic myelogenous leukemia (CML) treated with Iminitab (Glivec) in different phases

International Nuclear Information System (INIS)

Davoli, R.; Ciarlo, S.; Acosta, I.; Perez, S.; Lagorio, S.; Pratti, A.A.

2003-01-01

Full text: IMINITAB, inhibitor of cytoplasmic transduction signs, and hindering neoplastic cells growth, is a new therapeutic agent for chronic myelogenous leukemia (CML). It is a tyrosine kinase bcrabl inhibitor, inhibiting also the c-kit receptor protein in gastrointestinal neoplasia and small cells lung cancer. The aim of the present work was to evaluate the effect of this agent in CML patients in two different time-periods, namely the chronic phase and the acute one. We hereby present two patients: 1) a 48 years old patient with radioactive contamination history, and 2) a 19 years old patient. In both cases diagnosis was confirmed by BM and BM biopsy, neutrophile alkaline phosphatase, and Ph chromosome t(9;22) (q34;q11). There were non-compatible BM donors available. Both patients were treated with hydroxyurea, hydroxyurea plus interferon, and one of them adding ARAC. Since there was no favorable response an Iminitab course was started. Patient (2) with blastic crisis remitted for 12 month until subsequent relapse and death. Patient (1) treated during chronic phase is still in remission. Neither of them attained negative Ph chromosome. Up to now, current reports show a high percentage of relapse in patients treated during the acute phase, while the chronic ones present a smaller number of relapses. It is to be noted the importance of the follow up during the chronic phase, due to the short time drug utilization in our country (May 2001). Good tolerance and sustained remission in CML patients allows being optimistic regarding this therapeutic agent. (author)
Secondary Psychometric Examination of the Dimensional Obsessive-Compulsive Scale: Classical Testing, Item Response Theory, and Differential Item Functioning.

Science.gov (United States)

Thibodeau, Michel A; Leonard, Rachel C; Abramowitz, Jonathan S; Riemann, Bradley C

2015-12-01

The Dimensional Obsessive-Compulsive Scale (DOCS) is a promising measure of obsessive-compulsive disorder (OCD) symptoms but has received minimal psychometric attention. We evaluated the utility and reliability of DOCS scores. The study included 832 students and 300 patients with OCD. Confirmatory factor analysis supported the originally proposed four-factor structure. DOCS total and subscale scores exhibited good to excellent internal consistency in both samples (α = .82 to α = .96). Patient DOCS total scores reduced substantially during treatment (t = 16.01, d = 1.02). DOCS total scores discriminated between students and patients (sensitivity = 0.76, 1 - specificity = 0.23). The measure did not exhibit gender-based differential item functioning as tested by Mantel-Haenszel chi-square tests. Expected response options for each item were plotted as a function of item response theory and demonstrated that DOCS scores incrementally discriminate OCD symptoms ranging from low to extremely high severity. Incremental differences in DOCS scores appear to represent unbiased and reliable differences in true OCD symptom severity. © The Author(s) 2014.
Factor Structure and Reliability of Test Items for Saudi Teacher Licence Assessment

Science.gov (United States)

Alsadaawi, Abdullah Saleh

2017-01-01

The Saudi National Assessment Centre administers the Computer Science Teacher Test for teacher certification. The aim of this study is to explore gender differences in candidates' scores, and investigate dimensionality, reliability, and differential item functioning using confirmatory factor analysis and item response theory. The confirmatory…
The effects of linguistic modification on ESL students' comprehension of nursing course test items.

Science.gov (United States)

Bosher, Susan; Bowles, Melissa

2008-01-01

Recent research has indicated that language may be a source of construct-irrelevant variance for non-native speakers of English, or English as a second language (ESL) students, when they take exams. As a result, exams may not accurately measure knowledge of nursing content. One accommodation often used to level the playing field for ESL students is linguistic modification, a process by which the reading load of test items is reduced while the content and integrity of the item are maintained. Research on the effects of linguistic modification has been conducted on examinees in the K-12 population, but is just beginning in other areas. This study describes the collaborative process by which items from a pathophysiology exam were linguistically modified and subsequently evaluated for comprehensibility by ESL students. Findings indicate that in a majority of cases, modification improved examinees' comprehension of test items. Implications for test item writing and future research are discussed.

Prediction of true test scores from observed item scores and ancillary data.

Science.gov (United States)

Haberman, Shelby J; Yao, Lili; Sinharay, Sandip

2015-05-01

In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE(®) General Analytical Writing and until 2009 in the case of TOEFL(®) iBT Writing. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e-rater(®). In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE General Analytical Writing and TOEFL iBT Writing. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability. © 2015 The British Psychological Society.
3CML: a software application for quality control of multi leaf collimators

International Nuclear Information System (INIS)

Miras, H.; Perez, M. A.; Macias, J.; Moreno, J. C.; Campo, J. L.; Ortiz, M.; Arrans, R.; Ortiz, A.; Terron, J. A.; Fernandez, D.

2011-01-01

The treatments of intensity modulated radiotherapy (IMRT) require a deep knowledge of the accuracy, precision and reproducibility of positioning of the plates that make up the multi leaf collimator (MLC). We have developed a computer application, 3CML, to analyze an image corresponding to a pattern of separate bands irradiation to determine the deviations of the positioning of the blades on the nominal values.
Effects of Reducing the Cognitive Load of Mathematics Test Items on Student Performance

Directory of Open Access Journals (Sweden)

Susan C. Gillmor

2015-01-01

Full Text Available This study explores a new item-writing framework for improving the validity of math assessment items. The authors transfer insights from Cognitive Load Theory (CLT, traditionally used in instructional design, to educational measurement. Fifteen, multiple-choice math assessment items were modified using research-based strategies for reducing extraneous cognitive load. An experimental design with 222 middle-school students tested the effects of the reduced cognitive load items on student performance and anxiety. Significant findings confirm the main research hypothesis that reducing the cognitive load of math assessment items improves student performance. Three load-reducing item modifications are identified as particularly effective for reducing item difficulty: signalling important information, aesthetic item organization, and removing extraneous content. Load reduction was not shown to impact student anxiety. Implications for classroom assessment and future research are discussed.
Sequential Use of Second-Generation Tyrosine Kinase Inhibitor Treatment and Intensive Chemotherapy Induced Long-Term Complete Molecular Response in Imatinib-Resistant CML Patient Presenting as a Myeloid Blast Crisis

Directory of Open Access Journals (Sweden)

Masaaki Tsuji

2017-01-01

Full Text Available Myeloid blast crisis of chronic myeloid leukemia (CML-MBC is rarely seen at presentation and has a poor prognosis. There is no standard therapy for CML-MBC. It is often difficult to distinguish CML-MBC from acute myeloid leukemia expressing the Philadelphia chromosome (Ph+ AML. We present a case in which CML-MBC was seen at the initial presentation in a 75-year-old male. He was treated with conventional AML-directed chemotherapy followed by imatinib mesylate monotherapy, which failed to induce response. However, he achieved long-term complete molecular response after combination therapy involving dasatinib, a second-generation tyrosine kinase inhibitor, and conventional chemotherapy.
Testing for Nonuniform Differential Item Functioning with Multiple Indicator Multiple Cause Models

Science.gov (United States)

Woods, Carol M.; Grimm, Kevin J.

2011-01-01

In extant literature, multiple indicator multiple cause (MIMIC) models have been presented for identifying items that display uniform differential item functioning (DIF) only, not nonuniform DIF. This article addresses, for apparently the first time, the use of MIMIC models for testing both uniform and nonuniform DIF with categorical indicators. A…
Analysis Test of Understanding of Vectors with the Three-Parameter Logistic Model of Item Response Theory and Item Response Curves Technique

Science.gov (United States)

Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

2016-01-01

This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming…
HLA-DRB1*16-restricted recognition of myeloid cells, including CD34+ CML progenitor cells

NARCIS (Netherlands)

Ebeling, Saskia B.; Ivanov, Roman; Hol, Samantha; Aarts, Tineke I.; Hagenbeek, Anton; Verdonck, Leo F.; Petersen, Eefke J.

2003-01-01

The therapeutic effect of a human leucocyte antigen (HLA)-identical allogeneic stem cell transplantation (allo-SCT) for the treatment of haematological malignancies is mediated partly by the allogeneic T cells that are administered together with the stem cell graft. Chronic myeloid leukaemia (CML)
IRT-Estimated Reliability for Tests Containing Mixed Item Formats

Science.gov (United States)

Shu, Lianghua; Schwarz, Richard D.

2014-01-01

As a global measure of precision, item response theory (IRT) estimated reliability is derived for four coefficients (Cronbach's a, Feldt-Raju, stratified a, and marginal reliability). Models with different underlying assumptions concerning test-part similarity are discussed. A detailed computational example is presented for the targeted…
Applications of NLP Techniques to Computer-Assisted Authoring of Test Items for Elementary Chinese

Science.gov (United States)

Liu, Chao-Lin; Lin, Jen-Hsiang; Wang, Yu-Chun

2010-01-01

The authors report an implemented environment for computer-assisted authoring of test items and provide a brief discussion about the applications of NLP techniques for computer assisted language learning. Test items can serve as a tool for language learners to examine their competence in the target language. The authors apply techniques for…
Assessment of imatinib as first-line treatment of chronic myeloid leukemia: 10-year survival results of the randomized CML study IV and impact of non-CML determinants.

Science.gov (United States)

Hehlmann, R; Lauseker, M; Saußele, S; Pfirrmann, M; Krause, S; Kolb, H J; Neubauer, A; Hossfeld, D K; Nerl, C; Gratwohl, A; Baerlocher, G M; Heim, D; Brümmendorf, T H; Fabarius, A; Haferlach, C; Schlegelberger, B; Müller, M C; Jeromin, S; Proetel, U; Kohlbrenner, K; Voskanyan, A; Rinaldetti, S; Seifarth, W; Spieß, B; Balleisen, L; Goebeler, M C; Hänel, M; Ho, A; Dengler, J; Falge, C; Kanz, L; Kremers, S; Burchert, A; Kneba, M; Stegelmann, F; Köhne, C A; Lindemann, H W; Waller, C F; Pfreundschuh, M; Spiekermann, K; Berdel, W E; Müller, L; Edinger, M; Mayer, J; Beelen, D W; Bentz, M; Link, H; Hertenstein, B; Fuchs, R; Wernli, M; Schlegel, F; Schlag, R; de Wit, M; Trümper, L; Hebart, H; Hahn, M; Thomalla, J; Scheid, C; Schafhausen, P; Verbeek, W; Eckart, M J; Gassmann, W; Pezzutto, A; Schenk, M; Brossart, P; Geer, T; Bildat, S; Schäfer, E; Hochhaus, A; Hasford, J

2017-11-01

Chronic myeloid leukemia (CML)-study IV was designed to explore whether treatment with imatinib (IM) at 400 mg/day (n=400) could be optimized by doubling the dose (n=420), adding interferon (IFN) (n=430) or cytarabine (n=158) or using IM after IFN-failure (n=128). From July 2002 to March 2012, 1551 newly diagnosed patients in chronic phase were randomized into a 5-arm study. The study was powered to detect a survival difference of 5% at 5 years. After a median observation time of 9.5 years, 10-year overall survival was 82%, 10-year progression-free survival was 80% and 10-year relative survival was 92%. Survival between IM400 mg and any experimental arm was not different. In a multivariate analysis, risk group, major-route chromosomal aberrations, comorbidities, smoking and treatment center (academic vs other) influenced survival significantly, but not any form of treatment optimization. Patients reaching the molecular response milestones at 3, 6 and 12 months had a significant survival advantage. For responders, monotherapy with IM400 mg provides a close to normal life expectancy independent of the time to response. Survival is more determined by patients' and disease factors than by initial treatment selection. Although improvements are also needed for refractory disease, more life-time can currently be gained by carefully addressing non-CML determinants of survival.
Redefining diagnostic symptoms of depression using Rasch analysis: testing an item bank suitable for DSM-V and computer adaptive testing.

Science.gov (United States)

Mitchell, Alex J; Smith, Adam B; Al-salihy, Zerak; Rahim, Twana A; Mahmud, Mahmud Q; Muhyaldin, Asma S

2011-10-01

We aimed to redefine the optimal self-report symptoms of depression suitable for creation of an item bank that could be used in computer adaptive testing or to develop a simplified screening tool for DSM-V. Four hundred subjects (200 patients with primary depression and 200 non-depressed subjects), living in Iraqi Kurdistan were interviewed. The Mini International Neuropsychiatric Interview (MINI) was used to define the presence of major depression (DSM-IV criteria). We examined symptoms of depression using four well-known scales delivered in Kurdish. The Partial Credit Model was applied to each instrument. Common-item equating was subsequently used to create an item bank and differential item functioning (DIF) explored for known subgroups. A symptom level Rasch analysis reduced the original 45 items to 24 items of the original after the exclusion of 21 misfitting items. A further six items (CESD13 and CESD17, HADS-D4, HADS-D5 and HADS-D7, and CDSS3 and CDSS4) were removed due to misfit as the items were added together to form the item bank, and two items were subsequently removed following the DIF analysis by diagnosis (CESD20 and CDSS9, both of which were harder to endorse for women). Therefore the remaining optimal item bank consisted of 17 items and produced an area under the curve (AUC) of 0.987. Using a bank restricted to the optimal nine items revealed only minor loss of accuracy (AUC = 0.989, sensitivity 96%, specificity 95%). Finally, when restricted to only four items accuracy was still high (AUC was still 0.976; sensitivity 93%, specificity 96%). An item bank of 17 items may be useful in computer adaptive testing and nine or even four items may be used to develop a simplified screening tool for DSM-V major depressive disorder (MDD). Further examination of this item bank should be conducted in different cultural settings.
Relationships among Classical Test Theory and Item Response Theory Frameworks via Factor Analytic Models

Science.gov (United States)

Kohli, Nidhi; Koran, Jennifer; Henn, Lisa

2015-01-01

There are well-defined theoretical differences between the classical test theory (CTT) and item response theory (IRT) frameworks. It is understood that in the CTT framework, person and item statistics are test- and sample-dependent. This is not the perception with IRT. For this reason, the IRT framework is considered to be theoretically superior…
Using response-time constraints in item selection to control for differential speededness in computerized adaptive testing

NARCIS (Netherlands)

van der Linden, Willem J.; Scrams, David J.; Schnipke, Deborah L.

2003-01-01

This paper proposes an item selection algorithm that can be used to neutralize the effect of time limits in computer adaptive testing. The method is based on a statistical model for the response-time distributions of the test takers on the items in the pool that is updated each time a new item has
Advanced Marketing Core Curriculum. Test Items and Assessment Techniques.

Science.gov (United States)

Smith, Clifton L.; And Others

This document contains duties and tasks, multiple-choice test items, and other assessment techniques for Missouri's advanced marketing core curriculum. The core curriculum begins with a list of 13 suggested textbook resources. Next, nine duties with their associated tasks are given. Under each task appears one or more citations to appropriate…
Item-focussed Trees for the Identification of Items in Differential Item Functioning.

Science.gov (United States)

Tutz, Gerhard; Berger, Moritz

2016-09-01

A novel method for the identification of differential item functioning (DIF) by means of recursive partitioning techniques is proposed. We assume an extension of the Rasch model that allows for DIF being induced by an arbitrary number of covariates for each item. Recursive partitioning on the item level results in one tree for each item and leads to simultaneous selection of items and variables that induce DIF. For each item, it is possible to detect groups of subjects with different item difficulties, defined by combinations of characteristics that are not pre-specified. The way a DIF item is determined by covariates is visualized in a small tree and therefore easily accessible. An algorithm is proposed that is based on permutation tests. Various simulation studies, including the comparison with traditional approaches to identify items with DIF, show the applicability and the competitive performance of the method. Two applications illustrate the usefulness and the advantages of the new method.
Test Score Equating Using Discrete Anchor Items versus Passage-Based Anchor Items: A Case Study Using "SAT"® Data. Research Report. ETS RR-14-14

Science.gov (United States)

Liu, Jinghua; Zu, Jiyun; Curley, Edward; Carey, Jill

2014-01-01

The purpose of this study is to investigate the impact of discrete anchor items versus passage-based anchor items on observed score equating using empirical data.This study compares an "SAT"® critical reading anchor that contains more discrete items proportionally, compared to the total tests to be equated, to another anchor that…
Development of an item bank for computerized adaptive test (CAT) measurement of pain

DEFF Research Database (Denmark)

Petersen, Morten Aa.; Aaronson, Neil K; Chie, Wei-Chu

2016-01-01

PURPOSE: Patient-reported outcomes should ideally be adapted to the individual patient while maintaining comparability of scores across patients. This is achievable using computerized adaptive testing (CAT). The aim here was to develop an item bank for CAT measurement of the pain domain as measured...... were obtained from 1103 cancer patients from five countries. Psychometric evaluations showed that 16 items could be retained in a unidimensional item bank. Evaluations indicated that use of the CAT measure may reduce sample size requirements with 15-25 % compared to using the QLQ-C30 pain scale....... CONCLUSIONS: We have established an item bank of 16 items suitable for CAT measurement of pain. While being backward compatible with the QLQ-C30, the new item bank will significantly improve measurement precision of pain. We recommend initiating CAT measurement by screening for pain using the two original QLQ...
Evaluation of multielements in human serum of patients with chronic myelogenous leukemia (CML) using SRTXRF; Avaliacao multielementar em soro humano de individuos portadores de leucemia mieloide cronica (LMC) usando SRTXRF

Energy Technology Data Exchange (ETDEWEB)

Leitao, Catarine Canellas Gondim

2005-04-15

In this work, trace elements were analyzed in serum of patients with chronic myelogenous leukemia (CML) by Total Reflection X-Ray Fluorescence using synchrotron radiation (SRTXRF). Chronic myelogenous leukemia (CML) affects the myeloid cells in the blood and affects 1 to 2 people per 100,000 and accounts for 7-20% cases of leukemia. Sixty patients with CML and sixty healthy volunteers (control group) were studied. Blood was collected into vacutainers without additives. Directly after collection, each blood sample was centrifuged at 3000 rev/min for 10 min in order to separate blood cells and suspended particles from blood serum. Sera were transferred into polyethylene tubes and stored in a freezer at 253 K. A 500 {sup m}u{sup L} serum quantity was spiked with Ga (50 {sup m}u{sup L} ) as internal standard. 10 {sup m}u{sup L} aliquots were pipetted on Perspex sample carrier. After deposition, the samples were left to dry under an infrared lamp. The measurements were performed at the X-Ray Fluorescence Beamline at Brazilian National Synchrotron Light Laboratory (LNLS), using a polychromatic beam. Standard solutions with gallium as internal standard were prepared for calibration system. It was possible to determine the concentrations of the following elements: P, S, Cl, K, Ca, Cr, Mn, Fe, Ni, Cu, Zn, Br and Rb. Starting from the ANOVA test was observed that the elements P, S, Ca, Cr, Mn, Fe, Cu and Rb presented real significant differences ({alpha} = 0.05) between groups (healthy subjects and CML patients) and Sex (males and females). (author)
A Feedback Control Strategy for Enhancing Item Selection Efficiency in Computerized Adaptive Testing

Science.gov (United States)

Weissman, Alexander

2006-01-01

A computerized adaptive test (CAT) may be modeled as a closed-loop system, where item selection is influenced by trait level ([theta]) estimation and vice versa. When discrepancies exist between an examinee's estimated and true [theta] levels, nonoptimal item selection is a likely result. Nevertheless, examinee response behavior consistent with…
Australian Biology Test Item Bank, Years 11 and 12. Volume II: Year 12.

Science.gov (United States)

Brown, David W., Ed.; Sewell, Jeffrey J., Ed.

This document consists of test items which are applicable to biology courses throughout Australia (irrespective of course materials used); assess key concepts within course statement (for both core and optional studies); assess a wide range of cognitive processes; and are relevant to current biological concepts. These items are arranged under…

Australian Biology Test Item Bank, Years 11 and 12. Volume I: Year 11.

Science.gov (United States)

Brown, David W., Ed.; Sewell, Jeffrey J., Ed.

This document consists of test items which are applicable to biology courses throughout Australia (irrespective of course materials used); assess key concepts within course statement (for both core and optional studies); assess a wide range of cognitive processes; and are relevant to current biological concepts. These items are arranged under…
In vitro evaluation of digestive and endolysosomal enzymes to cleave CML-modified Ara h 1 peptides

Science.gov (United States)

The sensory, biological, chemical, and immunological characteristics of foods can be modified non-enzymatically during processing. Notably, these modifications may modulate the allergenic potency of food allergens, such as the Ara h 1 peanut allergen. Carboxymethyl-lysine (CML) modification is a p...
Do Self Concept Tests Test Self Concept? An Evaluation of the Validity of Items on the Piers Harris and Coopersmith Measures.

Science.gov (United States)

Lynch, Mervin D.; Chaves, John

Items from Peirs-Harris and Coopersmith self-concept tests were evaluated against independent measures on three self-constructs, idealized, empathic, and worth. Construct measurements were obtained with the semantic differential and D statistic. Ratings were obtained from 381 children, grades 4-6. For each test, item ratings and construct measures…
Why Students Answer TIMSS Science Test Items the Way They Do

Science.gov (United States)

Harlow, Ann; Jones, Alister

2004-04-01

The purpose of this study was to explore how Year 8 students answered Third International Mathematics and Science Study (TIMSS) questions and whether the test questions represented the scientific understanding of these students. One hundred and seventy-seven students were tested using written test questions taken from the science test used in the Third International Mathematics and Science Study. The degree to which a sample of 38 children represented their understanding of the topics in a written test compared to the level of understanding that could be elicited by an interview is presented in this paper. In exploring student responses in the interview situation this study hoped to gain some insight into the science knowledge that students held and whether or not the test items had been able to elicit this knowledge successfully. We question the usefulness and quality of data from large-scale summative assessments on their own to represent student scientific understanding and conclude that large scale written test items, such as TIMSS, on their own are not a valid way of exploring students'' understanding of scientific concepts. Considerable caution is therefore needed in exploiting the outcomes of international achievement testing when considering educational policy changes or using TIMSS data on their own to represent student understanding.
Strategies for Controlling Item Exposure in Computerized Adaptive Testing with the Generalized Partial Credit Model

Science.gov (United States)

Davis, Laurie Laughlin

2004-01-01

Choosing a strategy for controlling item exposure has become an integral part of test development for computerized adaptive testing (CAT). This study investigated the performance of six procedures for controlling item exposure in a series of simulated CATs under the generalized partial credit model. In addition to a no-exposure control baseline…
Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement.

Science.gov (United States)

McInnes, Matthew D F; Moher, David; Thombs, Brett D; McGrath, Trevor A; Bossuyt, Patrick M; Clifford, Tammy; Cohen, Jérémie F; Deeks, Jonathan J; Gatsonis, Constantine; Hooft, Lotty; Hunt, Harriet A; Hyde, Christopher J; Korevaar, Daniël A; Leeflang, Mariska M G; Macaskill, Petra; Reitsma, Johannes B; Rodin, Rachel; Rutjes, Anne W S; Salameh, Jean-Paul; Stevens, Adrienne; Takwoingi, Yemisi; Tonelli, Marcello; Weeks, Laura; Whiting, Penny; Willis, Brian H

2018-01-23

Systematic reviews of diagnostic test accuracy synthesize data from primary diagnostic studies that have evaluated the accuracy of 1 or more index tests against a reference standard, provide estimates of test performance, allow comparisons of the accuracy of different tests, and facilitate the identification of sources of variability in test accuracy. To develop the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) diagnostic test accuracy guideline as a stand-alone extension of the PRISMA statement. Modifications to the PRISMA statement reflect the specific requirements for reporting of systematic reviews and meta-analyses of diagnostic test accuracy studies and the abstracts for these reviews. Established standards from the Enhancing the Quality and Transparency of Health Research (EQUATOR) Network were followed for the development of the guideline. The original PRISMA statement was used as a framework on which to modify and add items. A group of 24 multidisciplinary experts used a systematic review of articles on existing reporting guidelines and methods, a 3-round Delphi process, a consensus meeting, pilot testing, and iterative refinement to develop the PRISMA diagnostic test accuracy guideline. The final version of the PRISMA diagnostic test accuracy guideline checklist was approved by the group. The systematic review (produced 64 items) and the Delphi process (provided feedback on 7 proposed items; 1 item was later split into 2 items) identified 71 potentially relevant items for consideration. The Delphi process reduced these to 60 items that were discussed at the consensus meeting. Following the meeting, pilot testing and iterative feedback were used to generate the 27-item PRISMA diagnostic test accuracy checklist. To reflect specific or optimal contemporary systematic review methods for diagnostic test accuracy, 8 of the 27 original PRISMA items were left unchanged, 17 were modified, 2 were added, and 2 were omitted. The 27-item
International Semiotics: Item Difficulty and the Complexity of Science Item Illustrations in the PISA-2009 International Test Comparison

Science.gov (United States)

Solano-Flores, Guillermo; Wang, Chao; Shade, Chelsey

2016-01-01

We examined multimodality (the representation of information in multiple semiotic modes) in the context of international test comparisons. Using Program of International Student Assessment (PISA)-2009 data, we examined the correlation of the difficulty of science items and the complexity of their illustrations. We observed statistically…
Item level diagnostics and model - data fit in item response theory ...

African Journals Online (AJOL)

Item response theory (IRT) is a framework for modeling and analyzing item response data. Item-level modeling gives IRT advantages over classical test theory. The fit of an item score pattern to an item response theory (IRT) models is a necessary condition that must be assessed for further use of item and models that best fit ...
The Technical Quality of Test Items Generated Using a Systematic Approach to Item Writing.

Science.gov (United States)

Siskind, Theresa G.; Anderson, Lorin W.

The study was designed to examine the similarity of response options generated by different item writers using a systematic approach to item writing. The similarity of response options to student responses for the same item stems presented in an open-ended format was also examined. A non-systematic (subject matter expertise) approach and a…
Evaluation of psychometric properties and differential item functioning of 8-item Child Perceptions Questionnaires using item response theory.

Science.gov (United States)

Yau, David T W; Wong, May C M; Lam, K F; McGrath, Colman

2015-08-19

Four-factor structure of the two 8-item short forms of Child Perceptions Questionnaire CPQ11-14 (RSF:8 and ISF:8) has been confirmed. However, the sum scores are typically reported in practice as a proxy of Oral health-related Quality of Life (OHRQoL), which implied a unidimensional structure. This study first assessed the unidimensionality of 8-item short forms of CPQ11-14. Item response theory (IRT) was employed to offer an alternative and complementary approach of validation and to overcome the limitations of classical test theory assumptions. A random sample of 649 12-year-old school children in Hong Kong was analyzed. Unidimensionality of the scale was tested by confirmatory factor analysis (CFA), principle component analysis (PCA) and local dependency (LD) statistic. Graded response model was fitted to the data. Contribution of each item to the scale was assessed by item information function (IIF). Reliability of the scale was assessed by test information function (TIF). Differential item functioning (DIF) across gender was identified by Wald test and expected score functions. Both CPQ11-14 RSF:8 and ISF:8 did not deviate much from the unidimensionality assumption. Results from CFA indicated acceptable fit of the one-factor model. PCA indicated that the first principle component explained >30 % of the total variation with high factor loadings for both RSF:8 and ISF:8. Almost all LD statistic items suggesting little contribution of information to the scale and item removal caused little practical impact. Comparing the TIFs, RSF:8 showed slightly better information than ISF:8. In addition to oral symptoms items, the item "Concerned with what other people think" demonstrated a uniform DIF (p Items related to oral symptoms were not informative to OHRQoL and deletion of these items is suggested. The impact of DIF across gender on the overall score was minimal. CPQ11-14 RSF:8 performed slightly better than ISF:8 in measurement precision. The 6-item short forms
Conditioning factors of test-taking engagement in PIAAC: an exploratory IRT modelling approach considering person and item characteristics

Directory of Open Access Journals (Sweden)

Frank Goldhammer

2017-11-01

Full Text Available Abstract Background A potential problem of low-stakes large-scale assessments such as the Programme for the International Assessment of Adult Competencies (PIAAC is low test-taking engagement. The present study pursued two goals in order to better understand conditioning factors of test-taking disengagement: First, a model-based approach was used to investigate whether item indicators of disengagement constitute a continuous latent person variable by domain. Second, the effects of person and item characteristics were jointly tested using explanatory item response models. Methods Analyses were based on the Canadian sample of Round 1 of the PIAAC, with N = 26,683 participants completing test items in the domains of literacy, numeracy, and problem solving. Binary item disengagement indicators were created by means of item response time thresholds. Results The results showed that disengagement indicators define a latent dimension by domain. Disengagement increased with lower educational attainment, lower cognitive skills, and when the test language was not the participant’s native language. Gender did not exert any effect on disengagement, while age had a positive effect for problem solving only. An item’s location in the second of two assessment modules was positively related to disengagement, as was item difficulty. The latter effect was negatively moderated by cognitive skill, suggesting that poor test-takers are especially likely to disengage with more difficult items. Conclusions The negative effect of cognitive skill, the positive effect of item difficulty, and their negative interaction effect support the assumption that disengagement is the outcome of individual expectations about success (informed disengagement.
Easy and Informative: Using Confidence-Weighted True-False Items for Knowledge Tests in Psychology Courses

Science.gov (United States)

Dutke, Stephan; Barenberg, Jonathan

2015-01-01

We introduce a specific type of item for knowledge tests, confidence-weighted true-false (CTF) items, and review experiences of its application in psychology courses. A CTF item is a statement about the learning content to which students respond whether the statement is true or false, and they rate their confidence level. Previous studies using…
A Comparison of the Approaches of Generalizability Theory and Item Response Theory in Estimating the Reliability of Test Scores for Testlet-Composed Tests

Science.gov (United States)

Lee, Guemin; Park, In-Yong

2012-01-01

Previous assessments of the reliability of test scores for testlet-composed tests have indicated that item-based estimation methods overestimate reliability. This study was designed to address issues related to the extent to which item-based estimation methods overestimate the reliability of test scores composed of testlets and to compare several…
Use of differential item functioning (DIF analysis for bias analysis in test construction

Directory of Open Access Journals (Sweden)

Marié De Beer

2004-10-01

Opsomming Waar differensiële itemfunksioneringsprosedures (DIF-prosedures vir itemontleding gebaseer op itemresponsteorie (IRT tydens toetskonstruksie gebruik word, is dit moontlik om itemkarakteristiekekrommes vir dieselfde item vir verskillende subgroepe voor te stel. Hierdie krommes dui aan hoe elke item vir die verskillende subgroepe op verskillende vermoënsvlakke te funksioneer. DIF word aangetoon deur die area tussen die krommes. DIF is in die konstruksie van die 'Learning Potential Computerised Adaptive test (LPCAT' gebruik om die items te identifiseer wat sydigheid ten opsigte van geslag, kultuur, taal of opleidingspeil geopenbaar het. Items wat ’n voorafbepaalde vlak van DIF oorskry het, is uit die finale itembank weggelaat, ongeag die subgroep wat bevoordeel of benadeel is. Die proses en resultate van die DIF-ontleding word bespreek.
Explanatory item response modelling of an abstract reasoning assessment: A case for modern test design

OpenAIRE

Helland, Fredrik

2016-01-01

Assessment is an integral part of society and education, and for this reason it is important to know what you measure. This thesis is about explanatory item response modelling of an abstract reasoning assessment, with the objective to create a modern test design framework for automatic generation of valid and precalibrated items of abstract reasoning. Modern test design aims to strengthen the connections between the different components of a test, with a stress on strong theory, systematic it...
Gender-Based Differential Item Performance in Mathematics Achievement Items.

Science.gov (United States)

Doolittle, Allen E.; Cleary, T. Anne

1987-01-01

Eight randomly equivalent samples of high school seniors were each given a unique form of the ACT Assessment Mathematics Usage Test (ACTM). Signed measures of differential item performance (DIP) were obtained for each item in the eight ACTM forms. DIP estimates were analyzed and a significant item category effect was found. (Author/LMO)
A comparison of item response models for accuracy and speed of item responses with applications to adaptive testing.

Science.gov (United States)

van Rijn, Peter W; Ali, Usama S

2017-05-01

We compare three modelling frameworks for accuracy and speed of item responses in the context of adaptive testing. The first framework is based on modelling scores that result from a scoring rule that incorporates both accuracy and speed. The second framework is the hierarchical modelling approach developed by van der Linden (2007, Psychometrika, 72, 287) in which a regular item response model is specified for accuracy and a log-normal model for speed. The third framework is the diffusion framework in which the response is assumed to be the result of a Wiener process. Although the three frameworks differ in the relation between accuracy and speed, one commonality is that the marginal model for accuracy can be simplified to the two-parameter logistic model. We discuss both conditional and marginal estimation of model parameters. Models from all three frameworks were fitted to data from a mathematics and spelling test. Furthermore, we applied a linear and adaptive testing mode to the data off-line in order to determine differences between modelling frameworks. It was found that a model from the scoring rule framework outperformed a hierarchical model in terms of model-based reliability, but the results were mixed with respect to correlations with external measures. © 2017 The British Psychological Society.
The Prediction of Item Parameters Based on Classical Test Theory and Latent Trait Theory

Science.gov (United States)

Anil, Duygu

2008-01-01

In this study, the prediction power of the item characteristics based on the experts' predictions on conditions try-out practices cannot be applied was examined for item characteristics computed depending on classical test theory and two-parameters logistic model of latent trait theory. The study was carried out on 9914 randomly selected students…
The Effect of Error in Item Parameter Estimates on the Test Response Function Method of Linking.

Science.gov (United States)

Kaskowitz, Gary S.; De Ayala, R. J.

2001-01-01

Studied the effect of item parameter estimation for computation of linking coefficients for the test response function (TRF) linking/equating method. Simulation results showed that linking was more accurate when there was less error in the parameter estimates, and that 15 or 25 common items provided better results than 5 common items under both…
The Relative Importance of Persons, Items, Subtests, and Languages to TOEFL Test Variance.

Science.gov (United States)

Brown, James Dean

1999-01-01

Explored the relative contributions to Test of English as a Foreign Language (TOEFL) score dependability of various numbers of persons, items, subtests, languages, and their various interactions. Sampled 15,000 test takers, 1000 each from 15 different language backgrounds. (Author/VWL)

A Method for Generating Educational Test Items That Are Aligned to the Common Core State Standards

Science.gov (United States)

Gierl, Mark J.; Lai, Hollis; Hogan, James B.; Matovinovic, Donna

2015-01-01

The demand for test items far outstrips the current supply. This increased demand can be attributed, in part, to the transition to computerized testing, but, it is also linked to dramatic changes in how 21st century educational assessments are designed and administered. One way to address this growing demand is with automatic item generation.…
A Comparison of Procedures for Content-Sensitive Item Selection in Computerized Adaptive Tests.

Science.gov (United States)

Kingsbury, G. Gage; Zara, Anthony R.

1991-01-01

This simulation investigated two procedures that reduce differences between paper-and-pencil testing and computerized adaptive testing (CAT) by making CAT content sensitive. Results indicate that the price in terms of additional test items of using constrained CAT for content balancing is much smaller than that of using testlets. (SLD)
An emotional functioning item bank of 24 items for computerized adaptive testing (CAT) was established

DEFF Research Database (Denmark)

Petersen, Morten Aa.; Gamper, Eva-Maria; Costantini, Anna

2016-01-01

of the widely used EORTC Quality of Life questionnaire (QLQ-C30). STUDY DESIGN AND SETTING: On the basis of literature search and evaluations by international samples of experts and cancer patients, 38 candidate items were developed. The psychometric properties of the items were evaluated in a large...... international sample of cancer patients. This included evaluations of dimensionality, item response theory (IRT) model fit, differential item functioning (DIF), and of measurement precision/statistical power. RESULTS: Responses were obtained from 1,023 cancer patients from four countries. The evaluations showed...... that 24 items could be included in a unidimensional IRT model. DIF did not seem to have any significant impact on the estimation of EF. Evaluations indicated that the CAT measure may reduce sample size requirements by up to 50% compared to the QLQ-C30 EF scale without reducing power. CONCLUSION...
Piecewise Polynomial Fitting with Trend Item Removal and Its Application in a Cab Vibration Test

Directory of Open Access Journals (Sweden)

Wu Ren

2018-01-01

Full Text Available The trend item of a long-term vibration signal is difficult to remove. This paper proposes a piecewise integration method to remove trend items. Examples of direct integration without trend item removal, global integration after piecewise polynomial fitting with trend item removal, and direct integration after piecewise polynomial fitting with trend item removal were simulated. The results showed that direct integration of the fitted piecewise polynomial provided greater acceleration and displacement precision than the other two integration methods. A vibration test was then performed on a special equipment cab. The results indicated that direct integration by piecewise polynomial fitting with trend item removal was highly consistent with the measured signal data. However, the direct integration method without trend item removal resulted in signal distortion. The proposed method can help with frequency domain analysis of vibration signals and modal parameter identification for such equipment.
Branched Adaptive Testing with a Rasch-Model-Calibrated Test: Analysing Item Presentation's Sequence Effects Using the Rasch-Model-Based LLTM

Science.gov (United States)

Kubinger, Klaus D.; Reif, Manuel; Yanagida, Takuya

2011-01-01

Item position effects provoke serious problems within adaptive testing. This is because different testees are necessarily presented with the same item at different presentation positions, as a consequence of which comparing their ability parameter estimations in the case of such effects would not at all be fair. In this article, a specific…
Using Differential Item Functioning Procedures to Explore Sources of Item Difficulty and Group Performance Characteristics.

Science.gov (United States)

Scheuneman, Janice Dowd; Gerritz, Kalle

1990-01-01

Differential item functioning (DIF) methodology for revealing sources of item difficulty and performance characteristics of different groups was explored. A total of 150 Scholastic Aptitude Test items and 132 Graduate Record Examination general test items were analyzed. DIF was evaluated for males and females and Blacks and Whites. (SLD)
Test-retest reliability of selected items of Health Behaviour in School-aged Children (HBSC survey questionnaire in Beijing, China

Directory of Open Access Journals (Sweden)

Liu Yang

2010-08-01

Full Text Available Abstract Background Children's health and health behaviour are essential for their development and it is important to obtain abundant and accurate information to understand young people's health and health behaviour. The Health Behaviour in School-aged Children (HBSC study is among the first large-scale international surveys on adolescent health through self-report questionnaires. So far, more than 40 countries in Europe and North America have been involved in the HBSC study. The purpose of this study is to assess the test-retest reliability of selected items in the Chinese version of the HBSC survey questionnaire in a sample of adolescents in Beijing, China. Methods A sample of 95 male and female students aged 11 or 15 years old participated in a test and retest with a three weeks interval. Student Identity numbers of respondents were utilized to permit matching of test-retest questionnaires. 23 items concerning physical activity, sedentary behaviour, sleep and substance use were evaluated by using the percentage of response shifts and the single measure Intraclass Correlation Coefficients (ICC with 95% confidence interval (CI for all respondents and stratified by gender and age. Items on substance use were only evaluated for school children aged 15 years old. Results The percentage of no response shift between test and retest varied from 32% for the item on computer use at weekends to 92% for the three items on smoking. Of all the 23 items evaluated, 6 items (26% showed a moderate reliability, 12 items (52% displayed a substantial reliability and 4 items (17% indicated almost perfect reliability. No gender and age group difference of the test-retest reliability was found except for a few items on sedentary behaviour. Conclusions The overall findings of this study suggest that most selected indicators in the HBSC survey questionnaire have satisfactory test-retest reliability for the students in Beijing. Further test-retest studies in a large
Analyzing Item Generation with Natural Language Processing Tools for the "TOEIC"® Listening Test. Research Report. ETS RR-17-52

Science.gov (United States)

Yoon, Su-Youn; Lee, Chong Min; Houghton, Patrick; Lopez, Melissa; Sakano, Jennifer; Loukina, Anastasia; Krovetz, Bob; Lu, Chi; Madani, Nitin

2017-01-01

In this study, we developed assistive tools and resources to support TOEIC® Listening test item generation. There has recently been an increased need for a large pool of items for these tests. This need has, in turn, inspired efforts to increase the efficiency of item generation while maintaining the quality of the created items. We aimed to…
The Dysexecutive Questionnaire advanced: item and test score characteristics, 4-factor solution, and severity classification.

Science.gov (United States)

Bodenburg, Sebastian; Dopslaff, Nina

2008-01-01

The Dysexecutive Questionnaire (DEX, , Behavioral assessment of the dysexecutive syndrome, 1996) is a standardized instrument to measure possible behavioral changes as a result of the dysexecutive syndrome. Although initially intended only as a qualitative instrument, the DEX has also been used increasingly to address quantitative problems. Until now there have not been more fundamental statistical analyses of the questionnaire's testing quality. The present study is based on an unselected sample of 191 patients with acquired brain injury and reports on the data relating to the quality of the items, the reliability and the factorial structure of the DEX. Item 3 displayed too great an item difficulty, whereas item 11 was not sufficiently discriminating. The DEX's reliability in self-rating is r = 0.85. In addition to presenting the statistical values of the tests, a clinical severity classification of the overall scores of the 4 found factors and of the questionnaire as a whole is carried out on the basis of quartile standards.
Development of Test Items Related to Selected Concepts Within the Scheme the Particle Nature of Matter.

Science.gov (United States)

Doran, Rodney L.; Pella, Milton O.

The purpose of this study was to develop tests items with a minimum reading demand for use with pupils at grade levels two through six. An item was judged to be acceptable if the item satisfied at least four of six criteria. Approximately 250 students in grades 2-6 participated in the study. Half of the students were given instruction to develop…
Varying levels of difficulty index of skills-test items randomly selected by examinees on the Korean emergency medical technician licensing examination.

Science.gov (United States)

Koh, Bongyeun; Hong, Sunggi; Kim, Soon-Sim; Hyun, Jin-Sook; Baek, Milye; Moon, Jundong; Kwon, Hayran; Kim, Gyoungyong; Min, Seonggi; Kang, Gu-Hyun

2016-01-01

The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination.
Varying levels of difficulty index of skills-test items randomly selected by examinees on the Korean emergency medical technician licensing examination

Directory of Open Access Journals (Sweden)

Bongyeun Koh

2016-01-01

Full Text Available Purpose: The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE, which requires examinees to select items randomly. Methods: The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. Results: In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01, as well as 4 of the 5 items on the advanced skills test (P<0.05. In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01, as well as all 3 of the advanced skills test items (P<0.01. Conclusion: In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination.
A leukocyte activation test identifies food items which induce release of DNA by innate immune peripheral blood leucocytes.

Science.gov (United States)

Garcia-Martinez, Irma; Weiss, Theresa R; Yousaf, Muhammad N; Ali, Ather; Mehal, Wajahat Z

2018-01-01

Leukocyte activation (LA) testing identifies food items that induce a patient specific cellular response in the immune system, and has recently been shown in a randomized double blinded prospective study to reduce symptoms in patients with irritable bowel syndrome (IBS). We hypothesized that test reactivity to particular food items, and the systemic immune response initiated by these food items, is due to the release of cellular DNA from blood immune cells. We tested this by quantifying total DNA concentration in the cellular supernatant of immune cells exposed to positive and negative foods from 20 healthy volunteers. To establish if the DNA release by positive samples is a specific phenomenon, we quantified myeloperoxidase (MPO) in cellular supernatants. We further assessed if a particular immune cell population (neutrophils, eosinophils, and basophils) was activated by the positive food items by flow cytometry analysis. To identify the signaling pathways that are required for DNA release we tested if specific inhibitors of key signaling pathways could block DNA release. Foods with a positive LA test result gave a higher supernatant DNA content when compared to foods with a negative result. This was specific as MPO levels were not increased by foods with a positive LA test. Protein kinase C (PKC) inhibitors resulted in inhibition of positive food stimulated DNA release. Positive foods resulted in CD63 levels greater than negative foods in eosinophils in 76.5% of tests. LA test identifies food items that result in release of DNA and activation of peripheral blood innate immune cells in a PKC dependent manner, suggesting that this LA test identifies food items that result in release of inflammatory markers and activation of innate immune cells. This may be the basis for the improvement in symptoms in IBS patients who followed an LA test guided diet.
Reading ability and print exposure: item response theory analysis of the author recognition test.

Science.gov (United States)

Moore, Mariah; Gordon, Peter C

2015-12-01

In the author recognition test (ART), participants are presented with a series of names and foils and are asked to indicate which ones they recognize as authors. The test is a strong predictor of reading skill, and this predictive ability is generally explained as occurring because author knowledge is likely acquired through reading or other forms of print exposure. In this large-scale study (1,012 college student participants), we used item response theory (IRT) to analyze item (author) characteristics in order to facilitate identification of the determinants of item difficulty, provide a basis for further test development, and optimize scoring of the ART. Factor analysis suggested a potential two-factor structure of the ART, differentiating between literary and popular authors. Effective and ineffective author names were identified so as to facilitate future revisions of the ART. Analyses showed that the ART is a highly significant predictor of the time spent encoding words, as measured using eyetracking during reading. The relationship between the ART and time spent reading provided a basis for implementing a higher penalty for selecting foils, rather than the standard method of ART scoring (names selected minus foils selected). The findings provide novel support for the view that the ART is a valid indicator of reading volume. Furthermore, they show that frequency data can be used to select items of appropriate difficulty, and that frequency data from corpora based on particular time periods and types of texts may allow adaptations of the test for different populations.
Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions.

Science.gov (United States)

Haberman, Shelby J; Sinharay, Sandip; Chon, Kyong Hee

2013-07-01

Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.
Role of STAT3 in Transformation and Drug Resistance in CML

International Nuclear Information System (INIS)

Nair, Rajesh R.; Tolentino, Joel H.; Hazlehurst, Lori A.

2012-01-01

Chronic myeloid leukemia (CML) is initially driven by the bcr–abl fusion oncoprotein. The identification of bcr–abl led to the discovery and rapid translation into the clinic of bcr–abl kinase inhibitors. Although, bcr–abl inhibitors are efficacious, experimental evidence indicates that targeting bcr–abl is not sufficient for elimination of minimal residual disease found within the bone marrow (BM). Experimental evidence indicates that the failure to eliminate the leukemic stem cell contributes to persistent minimal residual disease. Thus curative strategies will likely need to focus on strategies where bcr–abl inhibitors are given in combination with agents that specifically target the leukemic stem cell or the leukemic stem cell niche. One potential target to be exploited is the Janus kinase (JAK)/signal transducers and activators of transcription 3 (STAT3) pathway. Recently using STAT3 conditional knock-out mice it was shown that STAT3 is critical for initiating the disease. Interestingly, in the absence of treatment, STAT3 was not shown to be required for maintenance of the disease, suggesting that STAT3 is required only in the tumor initiating stem cell population (Hoelbl et al., 2010). In the context of the BM microenvironment, STAT3 is activated in a bcr–abl independent manner by the cytokine milieu. Activation of JAK/STAT3 was shown to contribute to cell survival even in the event of complete inhibition of bcr–abl activity within the BM compartment. Taken together, these studies suggest that JAK/STAT3 is an attractive therapeutic target for developing strategies for targeting the JAK–STAT3 pathway in combination with bcr–abl kinase inhibitors and may represent a viable strategy for eliminating or reducing minimal residual disease located in the BM in CML.
Role of STAT3 in Transformation and Drug Resistance in CML

Energy Technology Data Exchange (ETDEWEB)

Nair, Rajesh R.; Tolentino, Joel H.; Hazlehurst, Lori A., E-mail: lori.hazlehurst@moffitt.org [Molecular Oncology Program, H. Lee Moffitt Cancer Center, Tampa, FL (United States)

2012-04-10

Chronic myeloid leukemia (CML) is initially driven by the bcr–abl fusion oncoprotein. The identification of bcr–abl led to the discovery and rapid translation into the clinic of bcr–abl kinase inhibitors. Although, bcr–abl inhibitors are efficacious, experimental evidence indicates that targeting bcr–abl is not sufficient for elimination of minimal residual disease found within the bone marrow (BM). Experimental evidence indicates that the failure to eliminate the leukemic stem cell contributes to persistent minimal residual disease. Thus curative strategies will likely need to focus on strategies where bcr–abl inhibitors are given in combination with agents that specifically target the leukemic stem cell or the leukemic stem cell niche. One potential target to be exploited is the Janus kinase (JAK)/signal transducers and activators of transcription 3 (STAT3) pathway. Recently using STAT3 conditional knock-out mice it was shown that STAT3 is critical for initiating the disease. Interestingly, in the absence of treatment, STAT3 was not shown to be required for maintenance of the disease, suggesting that STAT3 is required only in the tumor initiating stem cell population (Hoelbl et al., 2010). In the context of the BM microenvironment, STAT3 is activated in a bcr–abl independent manner by the cytokine milieu. Activation of JAK/STAT3 was shown to contribute to cell survival even in the event of complete inhibition of bcr–abl activity within the BM compartment. Taken together, these studies suggest that JAK/STAT3 is an attractive therapeutic target for developing strategies for targeting the JAK–STAT3 pathway in combination with bcr–abl kinase inhibitors and may represent a viable strategy for eliminating or reducing minimal residual disease located in the BM in CML.
The chimeric ubiquitin ligase SH2-U-box inhibits the growth of imatinib-sensitive and resistant CML by targeting the native and T315I-mutant BCR-ABL.

Science.gov (United States)

Ru, Yi; Wang, Qinhao; Liu, Xiping; Zhang, Mei; Zhong, Daixing; Ye, Mingxiang; Li, Yuanchun; Han, Hua; Yao, Libo; Li, Xia

2016-06-22

Chronic myeloid leukemia (CML) is characterized by constitutively active fusion protein tyrosine kinase BCR-ABL. Although the tyrosine kinase inhibitor (TKI) against BCR-ABL, imatinib, is the first-line therapy for CML, acquired resistance almost inevitably emerges. The underlying mechanism are point mutations within the BCR-ABL gene, among which T315I is notorious because it resists to almost all currently available inhibitors. Here we took use of a previously generated chimeric ubiquitin ligase, SH2-U-box, in which SH2 from the adaptor protein Grb2 acts as a binding domain for activated BCR-ABL, while U-box from CHIP functions as an E3 ubiquitin ligase domain, so as to target the ubiquitination and degradation of both native and T315I-mutant BCR-ABL. As such, SH2-U-box significantly inhibited proliferation and induced apoptosis in CML cells harboring either the wild-type or T315I-mutant BCR-ABL (K562 or K562R), with BCR-ABL-dependent signaling pathways being repressed. Moreover, SH2-U-box worked in concert with imatinib in K562 cells. Importantly, SH2-U-box-carrying lentivirus could markedly suppress the growth of K562-xenografts in nude mice or K562R-xenografts in SCID mice, as well as that of primary CML cells. Collectively, by degrading the native and T315I-mutant BCR-ABL, the chimeric ubiquitin ligase SH2-U-box may serve as a potential therapy for both imatinib-sensitive and resistant CML.
Test-retest reliability of Eurofit Physical Fitness items for children with visual impairments

NARCIS (Netherlands)

Houwen, Suzanne; Visscher, Chris; Hartman, Esther; Lemmink, Koen A. P. M.

The purpose of this study was to examine the test-retest reliability of physical fitness items from the European Test of Physical Fitness (Eurofit) for children with visual impairments. A sample of 21 children, ages 6-12 years, that were recruited from a special school for children with visual
What Does a Verbal Test Measure? A New Approach to Understanding Sources of Item Difficulty.

Science.gov (United States)

Berk, Eric J. Vanden; Lohman, David F.; Cassata, Jennifer Coyne

Assessing the construct relevance of mental test results continues to present many challenges, and it has proven to be particularly difficult to assess the construct relevance of verbal items. This study was conducted to gain a better understanding of the conceptual sources of verbal item difficulty using a unique approach that integrates…

Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning

Science.gov (United States)

Li, Zhushan

2014-01-01

Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
Synergistic apoptosis of CML cells by buthionine sulfoximine and hydroxychavicol correlates with activation of AIF and GSH-ROS-JNK-ERK-iNOS pathway.

Directory of Open Access Journals (Sweden)

Avik Acharya Chowdhury

Full Text Available BACKGROUND: Hydroxychavicol (HCH, a constituent of Piper betle leaf has been reported to exert anti-leukemic activity through induction of reactive oxygen species (ROS. The aim of the study is to optimize the oxidative stress -induced chronic myeloid leukemic (CML cell death by combining glutathione synthesis inhibitor, buthionine sulfoximine (BSO with HCH and studying the underlying mechanism. MATERIALS AND METHODS: Anti-proliferative activity of BSO and HCH alone or in combination against a number of leukemic (K562, KCL22, KU812, U937, Molt4, non-leukemic (A549, MIA-PaCa2, PC-3, HepG2 cancer cell lines and normal cell lines (NIH3T3, Vero was measured by MTT assay. Apoptotic activity in CML cell line K562 was detected by flow cytometry (FCM after staining with annexin V-FITC/propidium iodide (PI, detection of reduced mitochondrial membrane potential after staining with JC-1, cleavage of caspase- 3 and poly (ADP-ribose polymerase proteins by western blot analysis and translocation of apoptosis inducing factor (AIF by confocal microscopy. Intracellular reduced glutathione (GSH was measured by colorimetric assay using GSH assay kit. 2',7'-dichlorodihydrofluorescein diacetate (DCF-DA and 4-amino-5-methylamino-2',7'-difluorofluorescein (DAF-FM were used as probes to measure intracellular increase in ROS and nitric oxide (NO levels respectively. Multiple techniques like siRNA transfection and pharmacological inhibition were used to understand the mechanisms of action. RESULTS: Non-apoptotic concentrations of BSO significantly potentiated HCH-induced apoptosis in K562 cells. BSO potentiated apoptosis-inducing activity of HCH in CML cells by caspase-dependent as well as caspase-independent but apoptosis inducing factor (AIF-dependent manner. Enhanced depletion of intracellular GSH induced by combined treatment correlated with induction of ROS. Activation of ROS- dependent JNK played a crucial role in ERK1/2 activation which subsequently induced the
Bayes factor covariance testing in item response models

NARCIS (Netherlands)

Fox, J.P.; Mulder, J.; Sinharay, Sandip

2017-01-01

Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning
Bayes Factor Covariance Testing in Item Response Models

NARCIS (Netherlands)

Fox, Jean-Paul; Mulder, Joris; Sinharay, Sandip

2017-01-01

Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning
An InGaAs/InP 40 GHz CML static frequency divider

International Nuclear Information System (INIS)

Su Yongbo; Jin Zhi; Cheng Wei; Ge Ji; Wang Xiantai; Chen Gaopeng; Liu Xinyu; Xu Anhuai; Qi Ming

2011-01-01

Static frequency dividers are widely used as a circuit performance benchmark or figure-of-merit indicator to gauge a particular device technology's ability to implement high speed digital and integrated high performance mixed-signal circuits. We report a 2 : 1 static frequency divider in InGaAs/InP heterojunction bipolar transistor technology. This is the first InP based digital integrated circuit ever reported on the mainland of China. The divider is implemented in differential current mode logic (CML) with 30 transistors. The circuit operated at a peak clock frequency of 40 GHz and dissipated 650 mW from a single -5 V supply. (semiconductor integrated circuits)
Item-saving assessment of self-care performance in children with developmental disabilities: A prospective caregiver-report computerized adaptive test

Science.gov (United States)

Chen, Cheng-Te; Chen, Yu-Lan; Lin, Yu-Ching; Hsieh, Ching-Lin; Tzeng, Jeng-Yi

2018-01-01

Objective The purpose of this study was to construct a computerized adaptive test (CAT) for measuring self-care performance (the CAT-SC) in children with developmental disabilities (DD) aged from 6 months to 12 years in a content-inclusive, precise, and efficient fashion. Methods The study was divided into 3 phases: (1) item bank development, (2) item testing, and (3) a simulation study to determine the stopping rules for the administration of the CAT-SC. A total of 215 caregivers of children with DD were interviewed with the 73-item CAT-SC item bank. An item response theory model was adopted for examining the construct validity to estimate item parameters after investigation of the unidimensionality, equality of slope parameters, item fitness, and differential item functioning (DIF). In the last phase, the reliability and concurrent validity of the CAT-SC were evaluated. Results The final CAT-SC item bank contained 56 items. The stopping rules suggested were (a) reliability coefficient greater than 0.9 or (b) 14 items administered. The results of simulation also showed that 85% of the estimated self-care performance scores would reach a reliability higher than 0.9 with a mean test length of 8.5 items, and the mean reliability for the rest was 0.86. Administering the CAT-SC could reduce the number of items administered by 75% to 84%. In addition, self-care performances estimated by the CAT-SC and the full item bank were very similar to each other (Pearson r = 0.98). Conclusion The newly developed CAT-SC can efficiently measure self-care performance in children with DD whose performances are comparable to those of TD children aged from 6 months to 12 years as precisely as the whole item bank. The item bank of the CAT-SC has good reliability and a unidimensional self-care construct, and the CAT can estimate self-care performance with less than 25% of the items in the item bank. Therefore, the CAT-SC could be useful for measuring self-care performance in children with
On the Relationship between Classical Test Theory and Item Response Theory: From One to the Other and Back

Science.gov (United States)

Raykov, Tenko; Marcoulides, George A.

2016-01-01

The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete…
Differential Item Functioning in While-Listening Performance Tests: The Case of the International English Language Testing System (IELTS) Listening Module

Science.gov (United States)

Aryadoust, Vahid

2012-01-01

This article investigates a version of the International English Language Testing System (IELTS) listening test for evidence of differential item functioning (DIF) based on gender, nationality, age, and degree of previous exposure to the test. Overall, the listening construct was found to be underrepresented, which is probably an important cause…
Evaluating the validity of the Work Role Functioning Questionnaire (Canadian French version) using classical test theory and item response theory.

Science.gov (United States)

Hong, Quan Nha; Coutu, Marie-France; Berbiche, Djamal

2017-01-01

The Work Role Functioning Questionnaire (WRFQ) was developed to assess workers' perceived ability to perform job demands and is used to monitor presenteeism. Still few studies on its validity can be found in the literature. The purpose of this study was to assess the items and factorial composition of the Canadian French version of the WRFQ (WRFQ-CF). Two measurement approaches were used to test the WRFQ-CF: Classical Test Theory (CTT) and non-parametric Item Response Theory (IRT). A total of 352 completed questionnaires were analyzed. A four-factor and three-factor model models were tested and shown respectively good fit with 14 items (Root Mean Square Error of Approximation (RMSEA) = 0.06, Standardized Root Mean Square Residual (SRMR) = 0.04, Bentler Comparative Fit Index (CFI) = 0.98) and with 17 items (RMSEA = 0.059, SRMR = 0.048, CFI = 0.98). Using IRT, 13 problematic items were identified, of which 9 were common with CTT. This study tested different models with fewer problematic items found in a three-factor model. Using a non-parametric IRT and CTT for item purification gave complementary results. IRT is still scarcely used and can be an interesting alternative method to enhance the quality of a measurement instrument. More studies are needed on the WRFQ-CF to refine its items and factorial composition.
Which Statistic Should Be Used to Detect Item Preknowledge When the Set of Compromised Items Is Known?

Science.gov (United States)

Sinharay, Sandip

2017-09-01

Benefiting from item preknowledge is a major type of fraudulent behavior during educational assessments. Belov suggested the posterior shift statistic for detection of item preknowledge and showed its performance to be better on average than that of seven other statistics for detection of item preknowledge for a known set of compromised items. Sinharay suggested a statistic based on the likelihood ratio test for detection of item preknowledge; the advantage of the statistic is that its null distribution is known. Results from simulated and real data and adaptive and nonadaptive tests are used to demonstrate that the Type I error rate and power of the statistic based on the likelihood ratio test are very similar to those of the posterior shift statistic. Thus, the statistic based on the likelihood ratio test appears promising in detecting item preknowledge when the set of compromised items is known.
Dynamic Testing of Analogical Reasoning in 5- to 6-Year-Olds: Multiple-Choice versus Constructed-Response Training Items

Science.gov (United States)

Stevenson, Claire E.; Heiser, Willem J.; Resing, Wilma C. M.

2016-01-01

Multiple-choice (MC) analogy items are often used in cognitive assessment. However, in dynamic testing, where the aim is to provide insight into potential for learning and the learning process, constructed-response (CR) items may be of benefit. This study investigated whether training with CR or MC items leads to differences in the strategy…
Teoria da Resposta ao Item Teoria de la respuesta al item Item response theory

Directory of Open Access Journals (Sweden)

Eutalia Aparecida Candido de Araujo

2009-12-01

Full Text Available A preocupação com medidas de traços psicológicos é antiga, sendo que muitos estudos e propostas de métodos foram desenvolvidos no sentido de alcançar este objetivo. Entre os trabalhos propostos, destaca-se a Teoria da Resposta ao Item (TRI que, a princípio, veio completar limitações da Teoria Clássica de Medidas, empregada em larga escala até hoje na medida de traços psicológicos. O ponto principal da TRI é que ela leva em consideração o item particularmente, sem relevar os escores totais; portanto, as conclusões não dependem apenas do teste ou questionário, mas de cada item que o compõe. Este artigo propõe-se a apresentar esta Teoria que revolucionou a teoria de medidas.La preocupación con las medidas de los rasgos psicológicos es antigua y muchos estudios y propuestas de métodos fueron desarrollados para lograr este objetivo. Entre estas propuestas de trabajo se incluye la Teoría de la Respuesta al Ítem (TRI que, en principio, vino a completar las limitaciones de la Teoría Clásica de los Tests, ampliamente utilizada hasta hoy en la medida de los rasgos psicológicos. El punto principal de la TRI es que se tiene en cuenta el punto concreto, sin relevar las puntuaciones totales; por lo tanto, los resultados no sólo dependen de la prueba o cuestionario, sino que de cada ítem que lo compone. En este artículo se propone presentar la Teoría que revolucionó la teoría de medidas.The concern with measures of psychological traits is old and many studies and proposals of methods were developed to achieve this goal. Among these proposed methods highlights the Item Response Theory (IRT that, in principle, came to complete limitations of the Classical Test Theory, which is widely used until nowadays in the measurement of psychological traits. The main point of IRT is that it takes into account the item in particular, not relieving the total scores; therefore, the findings do not only depend on the test or questionnaire
An Exercise in Extrapolation: Clinical Management of Atypical CML, MDS/MPN-Unclassifiable, and MDS/MPN-RS-T.

Science.gov (United States)

Talati, Chetasi; Padron, Eric

2016-12-01

According to the recently published 2016 World Health Organization (WHO) classification of myeloid malignancies, myelodysplastic/myeloproliferative neoplasms (MDS/MPN) include atypical chronic myeloid leukemia (aCML), MDS/MPN-unclassifiable (MDS/MPN-U), chronic myelomonocytic leukemia (CMML), juvenile myelomonocytic leukemia (JMML), and MDS/MPN ring sideroblasts with thrombocytosis (MDS/MPN-RS-T). MDS/MPN-RS-T was previously a provisional category known as refractory anemia with ring sideroblasts with thrombocytosis (RARS-T) which has now attained a distinct designation in the 2016 WHO classification. In this review, we focus on biology and management of aCML, MDS/MPN-U, and MDS/MPN-RS-T. There is considerable overlap between these entities which we attempt to further elucidate in this review. We also discuss recent advances in the field of molecular landscape that further defines and characterizes this heterogeneous group of disorders. The paucity of clinical trials available secondary to unclear pathogenesis and rarity of these diseases makes the management of these entities clinically challenging. This review summarizes some of the current knowledge of the molecular pathogenesis and suggested treatment guidelines based on the available data.
Comparison of Classical Test Theory and Item Response Theory in Individual Change Assessment

NARCIS (Netherlands)

Jabrayilov, Ruslan; Emons, Wilco H. M.; Sijtsma, Klaas

2016-01-01

Clinical psychologists are advised to assess clinical and statistical significance when assessing change in individual patients. Individual change assessment can be conducted using either the methodologies of classical test theory (CTT) or item response theory (IRT). Researchers have been optimistic
A more general model for testing measurement invariance and differential item functioning.

Science.gov (United States)

Bauer, Daniel J

2017-09-01

The evaluation of measurement invariance is an important step in establishing the validity and comparability of measurements across individuals. Most commonly, measurement invariance has been examined using 1 of 2 primary latent variable modeling approaches: the multiple groups model or the multiple-indicator multiple-cause (MIMIC) model. Both approaches offer opportunities to detect differential item functioning within multi-item scales, and thereby to test measurement invariance, but both approaches also have significant limitations. The multiple groups model allows 1 to examine the invariance of all model parameters but only across levels of a single categorical individual difference variable (e.g., ethnicity). In contrast, the MIMIC model permits both categorical and continuous individual difference variables (e.g., sex and age) but permits only a subset of the model parameters to vary as a function of these characteristics. The current article argues that moderated nonlinear factor analysis (MNLFA) constitutes an alternative, more flexible model for evaluating measurement invariance and differential item functioning. We show that the MNLFA subsumes and combines the strengths of the multiple group and MIMIC models, allowing for a full and simultaneous assessment of measurement invariance and differential item functioning across multiple categorical and/or continuous individual difference variables. The relationships between the MNLFA model and the multiple groups and MIMIC models are shown mathematically and via an empirical demonstration. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
A Review of Classical Methods of Item Analysis.

Science.gov (United States)

French, Christine L.

Item analysis is a very important consideration in the test development process. It is a statistical procedure to analyze test items that combines methods used to evaluate the important characteristics of test items, such as difficulty, discrimination, and distractibility of the items in a test. This paper reviews some of the classical methods for…
A simple and fast item selection procedure for adaptive testing

NARCIS (Netherlands)

Veerkamp, W.J.J.; Veerkamp, Wim J.J.; Berger, Martijn; Berger, Martijn P.F.

1994-01-01

Items with the highest discrimination parameter values in a logistic item response theory (IRT) model do not necessarily give maximum information. This paper shows which discrimination parameter values (as a function of the guessing parameter and the distance between person ability and item
Development of a psychological test to measure ability-based emotional intelligence in the Indonesian workplace using an item response theory.

Science.gov (United States)

Fajrianthi; Zein, Rizqy Amelia

2017-01-01

This study aimed to develop an emotional intelligence (EI) test that is suitable to the Indonesian workplace context. Airlangga Emotional Intelligence Test (Tes Kecerdasan Emosi Airlangga [TKEA]) was designed to measure three EI domains: 1) emotional appraisal, 2) emotional recognition, and 3) emotional regulation. TKEA consisted of 120 items with 40 items for each subset. TKEA was developed based on the Situational Judgment Test (SJT) approach. To ensure its psychometric qualities, categorical confirmatory factor analysis (CCFA) and item response theory (IRT) were applied to test its validity and reliability. The study was conducted on 752 participants, and the results showed that test information function (TIF) was 3.414 (ability level = 0) for subset 1, 12.183 for subset 2 (ability level = -2), and 2.398 for subset 3 (level of ability = -2). It is concluded that TKEA performs very well to measure individuals with a low level of EI ability. It is worth to note that TKEA is currently at the development stage; therefore, in this study, we investigated TKEA's item analysis and dimensionality test of each TKEA subset.
Software Note: Using BILOG for Fixed-Anchor Item Calibration

Science.gov (United States)

DeMars, Christine E.; Jurich, Daniel P.

2012-01-01

The nonequivalent groups anchor test (NEAT) design is often used to scale item parameters from two different test forms. A subset of items, called the anchor items or common items, are administered as part of both test forms. These items are used to adjust the item calibrations for any differences in the ability distributions of the groups taking…
A 67-Item Stress Resilience item bank showing high content validity was developed in a psychosomatic sample.

Science.gov (United States)

Obbarius, Nina; Fischer, Felix; Obbarius, Alexander; Nolte, Sandra; Liegl, Gregor; Rose, Matthias

2018-04-10

To develop the first item bank to measure Stress Resilience (SR) in clinical populations. Qualitative item development resulted in an initial pool of 131 items covering a broad theoretical SR concept. These items were tested in n=521 patients at a psychosomatic outpatient clinic. Exploratory and Confirmatory Factor Analysis (CFA), as well as other state-of-the-art item analyses and IRT were used for item evaluation and calibration of the final item bank. Out of the initial item pool of 131 items, we excluded 64 items (54 factor loading .3, 2 non-discriminative Item Response Curves, 4 Differential Item Functioning). The final set of 67 items indicated sufficient model fit in CFA and IRT analyses. Additionally, a 10-item short form with high measurement precision (SE≤.32 in a theta range between -1.8 and +1.5) was derived. Both the SR item bank and the SR short form were highly correlated with an existing static legacy tool (Connor-Davidson Resilience Scale). The final SR item bank and 10-item short form showed good psychometric properties. When further validated, they will be ready to be used within a framework of Computer-Adaptive Tests for a comprehensive assessment of the Stress-Construct. Copyright © 2018. Published by Elsevier Inc.

Location Indices for Ordinal Polytomous Items Based on Item Response Theory. Research Report. ETS RR-15-20

Science.gov (United States)

Ali, Usama S.; Chang, Hua-Hua; Anderson, Carolyn J.

2015-01-01

Polytomous items are typically described by multiple category-related parameters; situations, however, arise in which a single index is needed to describe an item's location along a latent trait continuum. Situations in which a single index would be needed include item selection in computerized adaptive testing or test assembly. Therefore single…
Item Banking with Embedded Standards

Science.gov (United States)

MacCann, Robert G.; Stanley, Gordon

2009-01-01

An item banking method that does not use Item Response Theory (IRT) is described. This method provides a comparable grading system across schools that would be suitable for low-stakes testing. It uses the Angoff standard-setting method to obtain item ratings that are stored with each item. An example of such a grading system is given, showing how…
Using Classical Test Theory and Item Response Theory to Evaluate the LSCI

Science.gov (United States)

Schlingman, Wayne M.; Prather, E. E.; Collaboration of Astronomy Teaching Scholars CATS

2011-01-01

Analyzing the data from the recent national study using the Light and Spectroscopy Concept Inventory (LSCI), this project uses both Classical Test Theory (CTT) and Item Response Theory (IRT) to investigate the LSCI itself in order to better understand what it is actually measuring. We use Classical Test Theory to form a framework of results that can be used to evaluate the effectiveness of individual questions at measuring differences in student understanding and provide further insight into the prior results presented from this data set. In the second phase of this research, we use Item Response Theory to form a theoretical model that generates parameters accounting for a student's ability, a question's difficulty, and estimate the level of guessing. The combined results from our investigations using both CTT and IRT are used to better understand the learning that is taking place in classrooms across the country. The analysis will also allow us to evaluate the effectiveness of individual questions and determine whether the item difficulties are appropriately matched to the abilities of the students in our data set. These results may require that some questions be revised, motivating the need for further development of the LSCI. This material is based upon work supported by the National Science Foundation under Grant No. 0715517, a CCLI Phase III Grant for the Collaboration of Astronomy Teaching Scholars (CATS). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Development of a psychological test to measure ability-based emotional intelligence in the Indonesian workplace using an item response theory

Directory of Open Access Journals (Sweden)

Fajrianthi

2017-11-01

Full Text Available Fajrianthi,1 Rizqy Amelia Zein2 1Department of Industrial and Organizational Psychology, 2Department of Personality and Social Psychology, Faculty of Psychology, Universitas Airlangga, Surabaya, East Java, Indonesia Abstract: This study aimed to develop an emotional intelligence (EI test that is suitable to the Indonesian workplace context. Airlangga Emotional Intelligence Test (Tes Kecerdasan Emosi Airlangga [TKEA] was designed to measure three EI domains: 1 emotional appraisal, 2 emotional recognition, and 3 emotional regulation. TKEA consisted of 120 items with 40 items for each subset. TKEA was developed based on the Situational Judgment Test (SJT approach. To ensure its psychometric qualities, categorical confirmatory factor analysis (CCFA and item response theory (IRT were applied to test its validity and reliability. The study was conducted on 752 participants, and the results showed that test information function (TIF was 3.414 (ability level = 0 for subset 1, 12.183 for subset 2 (ability level = -2, and 2.398 for subset 3 (level of ability = -2. It is concluded that TKEA performs very well to measure individuals with a low level of EI ability. It is worth to note that TKEA is currently at the development stage; therefore, in this study, we investigated TKEA’s item analysis and dimensionality test of each TKEA subset. Keywords: categorical confirmatory factor analysis, emotional intelligence, item response theory
Assessment of chromium(VI) release from 848 jewellery items by use of a diphenylcarbazide spot test

DEFF Research Database (Denmark)

Bregnbak, David; Johansen, Jeanne D.; Hamann, Dathan

2016-01-01

We recently evaluated and validated a diphenylcarbazide(DPC)-based screening spot test that can detect the release of chromium(VI) ions (≥0.5 ppm) from various metallic items and leather goods (1). We then screened a selection of metal screws, leather shoes, and gloves, as well as 50 earrings......, and identified chromium(VI) release from one earring. In the present study, we used the DPC spot test to assess chromium(VI) release in a much larger sample of jewellery items (n=848), 160 (19%) of which had previously be shown to contain chromium when analysed with X-ray fluorescence spectroscopy (2)....
Generalization of the Lord-Wingersky Algorithm to Computing the Distribution of Summed Test Scores Based on Real-Number Item Scores

Science.gov (United States)

Kim, Seonghoon

2013-01-01

With known item response theory (IRT) item parameters, Lord and Wingersky provided a recursive algorithm for computing the conditional frequency distribution of number-correct test scores, given proficiency. This article presents a generalized algorithm for computing the conditional distribution of summed test scores involving real-number item…
Effect of Item Response Theory (IRT) Model Selection on Testlet-Based Test Equating. Research Report. ETS RR-14-19

Science.gov (United States)

Cao, Yi; Lu, Ru; Tao, Wei

2014-01-01

The local item independence assumption underlying traditional item response theory (IRT) models is often not met for tests composed of testlets. There are 3 major approaches to addressing this issue: (a) ignore the violation and use a dichotomous IRT model (e.g., the 2-parameter logistic [2PL] model), (b) combine the interdependent items to form a…
Modeling Item-Level and Step-Level Invariance Effects in Polytomous Items Using the Partial Credit Model

Science.gov (United States)

Gattamorta, Karina A.; Penfield, Randall D.; Myers, Nicholas D.

2012-01-01

Measurement invariance is a common consideration in the evaluation of the validity and fairness of test scores when the tested population contains distinct groups of examinees, such as examinees receiving different forms of a translated test. Measurement invariance in polytomous items has traditionally been evaluated at the item-level,…
Item analysis of the Spanish version of the Boston Naming Test with a Spanish speaking adult population from Colombia.

Science.gov (United States)

Kim, Stella H; Strutt, Adriana M; Olabarrieta-Landa, Laiene; Lequerica, Anthony H; Rivera, Diego; De Los Reyes Aragon, Carlos Jose; Utria, Oscar; Arango-Lasprilla, Juan Carlos

2018-02-23

The Boston Naming Test (BNT) is a widely used measure of confrontation naming ability that has been criticized for its questionable construct validity for non-English speakers. This study investigated item difficulty and construct validity of the Spanish version of the BNT to assess cultural and linguistic impact on performance. Subjects were 1298 healthy Spanish speaking adults from Colombia. They were administered the 60- and 15-item Spanish version of the BNT. A Rasch analysis was computed to assess dimensionality, item hierarchy, targeting, reliability, and item fit. Both versions of the BNT satisfied requirements for unidimensionality. Although internal consistency was excellent for the 60-item BNT, order of difficulty did not increase consistently with item number and there were a number of items that did not fit the Rasch model. For the 15-item BNT, a total of 5 items changed position on the item hierarchy with 7 poor fitting items. Internal consistency was acceptable. Construct validity of the BNT remains a concern when it is administered to non-English speaking populations. Similar to previous findings, the order of item presentation did not correspond with increasing item difficulty, and both versions were inadequate at assessing high naming ability.
The six-item Clock Drawing Test – reliability and validity in mild Alzheimer’s disease

DEFF Research Database (Denmark)

Jørgensen, Kasper; Kristensen, Maria K; Waldemar, Gunhild

2015-01-01

This study presents a reliable, short and practical version of the Clock Drawing Test (CDT) for clinical use and examines its diagnostic accuracy in mild Alzheimer's disease versus elderly nonpatients. Clock drawings from 231 participants were scored independently by four clinical neuropsychologi......This study presents a reliable, short and practical version of the Clock Drawing Test (CDT) for clinical use and examines its diagnostic accuracy in mild Alzheimer's disease versus elderly nonpatients. Clock drawings from 231 participants were scored independently by four clinical...... neuropsychologists blind to diagnostic classification. The interrater agreement of individual scoring criteria was analyzed and items with poor or moderate reliability were excluded. The classification accuracy of the resulting scoring system - the six-item CDT - was examined. We explored the effect of further...
Test of Achievement in Quantitative Economics for Secondary Schools: Construction and Validation Using Item Response Theory

Science.gov (United States)

Eleje, Lydia I.; Esomonu, Nkechi P. M.

2018-01-01

A Test to measure achievement in quantitative economics among secondary school students was developed and validated in this study. The test is made up 20 multiple choice test items constructed based on quantitative economics sub-skills. Six research questions guided the study. Preliminary validation was done by two experienced teachers in…
Dynamic Testing of Analogical Reasoning in 5- to 6-Year-Olds : Multiple-Choice Versus Constructed-Response Training Items

NARCIS (Netherlands)

Stevenson, C.E.; Heiser, W.J.; Resing, W.C.M.

2016-01-01

Multiple-choice (MC) analogy items are often used in cognitive assessment. However, in dynamic testing, where the aim is to provide insight into potential for learning and the learning process, constructed-response (CR) items may be of benefit. This study investigated whether training with CR or MC
Development of Abbreviated Nine-Item Forms of the Raven's Standard Progressive Matrices Test

Science.gov (United States)

Bilker, Warren B.; Hansen, John A.; Brensinger, Colleen M.; Richard, Jan; Gur, Raquel E.; Gur, Ruben C.

2012-01-01

The Raven's Standard Progressive Matrices (RSPM) is a 60-item test for measuring abstract reasoning, considered a nonverbal estimate of fluid intelligence, and often included in clinical assessment batteries and research on patients with cognitive deficits. The goal was to develop and apply a predictive model approach to reduce the number of items…
Sensitivity and specificity of the 3-item memory test in the assessment of post traumatic amnesia.

Science.gov (United States)

Andriessen, Teuntje M J C; de Jong, Ben; Jacobs, Bram; van der Werf, Sieberen P; Vos, Pieter E

2009-04-01

To investigate how the type of stimulus (pictures or words) and the method of reproduction (free recall or recognition after a short or a long delay) affect the sensitivity and specificity of a 3-item memory test in the assessment of post traumatic amnesia (PTA). Daily testing was performed in 64 consecutively admitted traumatic brain injured patients, 22 orthopedically injured patients and 26 healthy controls until criteria for resolution of PTA were reached. Subjects were randomly assigned to a test with visual or verbal stimuli. Short delay reproduction was tested after an interval of 3-5 minutes, long delay reproduction was tested after 24 hours. Sensitivity and specificity were calculated over the first 4 test days. The 3-word test showed higher sensitivity than the 3-picture test, while specificity of the two tests was equally high. Free recall was a more effortful task than recognition for both patients and controls. In patients, a longer delay between registration and recall resulted in a significant decrease in the number of items reproduced. Presence of PTA is best assessed with a memory test that incorporates the free recall of words after a long delay.
Using Cochran's Z Statistic to Test the Kernel-Smoothed Item Response Function Differences between Focal and Reference Groups

Science.gov (United States)

Zheng, Yinggan; Gierl, Mark J.; Cui, Ying

2010-01-01

This study combined the kernel smoothing procedure and a nonparametric differential item functioning statistic--Cochran's Z--to statistically test the difference between the kernel-smoothed item response functions for reference and focal groups. Simulation studies were conducted to investigate the Type I error and power of the proposed…
Item Response Theory with Covariates (IRT-C): Assessing Item Recovery and Differential Item Functioning for the Three-Parameter Logistic Model

Science.gov (United States)

Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.

2016-01-01

In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…
Towards an authoring system for item construction

NARCIS (Netherlands)

Rikers, Jos H.A.N.

1988-01-01

The process of writing test items is analyzed, and a blueprint is presented for an authoring system for test item writing to reduce invalidity and to structure the process of item writing. The developmental methodology is introduced, and the first steps in the process are reported. A historical
A new color image encryption scheme using CML and a fractional-order chaotic system.

Directory of Open Access Journals (Sweden)

Xiangjun Wu

Full Text Available The chaos-based image cryptosystems have been widely investigated in recent years to provide real-time encryption and transmission. In this paper, a novel color image encryption algorithm by using coupled-map lattices (CML and a fractional-order chaotic system is proposed to enhance the security and robustness of the encryption algorithms with a permutation-diffusion structure. To make the encryption procedure more confusing and complex, an image division-shuffling process is put forward, where the plain-image is first divided into four sub-images, and then the position of the pixels in the whole image is shuffled. In order to generate initial conditions and parameters of two chaotic systems, a 280-bit long external secret key is employed. The key space analysis, various statistical analysis, information entropy analysis, differential analysis and key sensitivity analysis are introduced to test the security of the new image encryption algorithm. The cryptosystem speed is analyzed and tested as well. Experimental results confirm that, in comparison to other image encryption schemes, the new algorithm has higher security and is fast for practical image encryption. Moreover, an extensive tolerance analysis of some common image processing operations such as noise adding, cropping, JPEG compression, rotation, brightening and darkening, has been performed on the proposed image encryption technique. Corresponding results reveal that the proposed image encryption method has good robustness against some image processing operations and geometric attacks.
A Method for the Comparison of Item Selection Rules in Computerized Adaptive Testing

Science.gov (United States)

Barrada, Juan Ramon; Olea, Julio; Ponsoda, Vicente; Abad, Francisco Jose

2010-01-01

In a typical study comparing the relative efficiency of two item selection rules in computerized adaptive testing, the common result is that they simultaneously differ in accuracy and security, making it difficult to reach a conclusion on which is the more appropriate rule. This study proposes a strategy to conduct a global comparison of two or…
Using Set Covering with Item Sampling to Analyze the Infeasibility of Linear Programming Test Assembly Models

Science.gov (United States)

Huitzing, Hiddo A.

2004-01-01

This article shows how set covering with item sampling (SCIS) methods can be used in the analysis and preanalysis of linear programming models for test assembly (LPTA). LPTA models can construct tests, fulfilling a set of constraints set by the test assembler. Sometimes, no solution to the LPTA model exists. The model is then said to be…

Memory for Items and Relationships among Items Embedded in Realistic Scenes: Disproportionate Relational Memory Impairments in Amnesia

Science.gov (United States)

Hannula, Deborah E.; Tranel, Daniel; Allen, John S.; Kirchhoff, Brenda A.; Nickel, Allison E.; Cohen, Neal J.

2014-01-01

Objective The objective of this study was to examine the dependence of item memory and relational memory on medial temporal lobe (MTL) structures. Patients with amnesia, who either had extensive MTL damage or damage that was relatively restricted to the hippocampus, were tested, as was a matched comparison group. Disproportionate relational memory impairments were predicted for both patient groups, and those with extensive MTL damage were also expected to have impaired item memory. Method Participants studied scenes, and were tested with interleaved two-alternative forced-choice probe trials. Probe trials were either presented immediately after the corresponding study trial (lag 1), five trials later (lag 5), or nine trials later (lag 9) and consisted of the studied scene along with a manipulated version of that scene in which one item was replaced with a different exemplar (item memory test) or was moved to a new location (relational memory test). Participants were to identify the exact match of the studied scene. Results As predicted, patients were disproportionately impaired on the test of relational memory. Item memory performance was marginally poorer among patients with extensive MTL damage, but both groups were impaired relative to matched comparison participants. Impaired performance was evident at all lags, including the shortest possible lag (lag 1). Conclusions The results are consistent with the proposed role of the hippocampus in relational memory binding and representation, even at short delays, and suggest that the hippocampus may also contribute to successful item memory when items are embedded in complex scenes. PMID:25068665
P2-19: The Effect of item Repetition on Item-Context Association Depends on the Prior Exposure of Items

Directory of Open Access Journals (Sweden)

Hongmi Lee

2012-10-01

Full Text Available Previous studies have reported conflicting findings on whether item repetition has beneficial or detrimental effects on source memory. To reconcile such contradictions, we investigated whether the degree of pre-exposure of items can be a potential modulating factor. The experimental procedures spanned two consecutive days. On Day 1, participants were exposed to a set of unfamiliar faces. On Day 2, the same faces presented on the previous day were used again in half of the participants, whereas novel faces were used for the other half. Day 2 procedures consisted of three successive phases: item repetition, source association, and source memory test. In the item repetition phase, half of the face stimuli were repeatedly presented while participants were making male/female judgments. During the source association phase, both the repeated and the unrepeated faces appeared in one of the four locations on the screen. Finally, participants were tested on the location in which a given face was presented during the previous phase and reported the confidence of their memory. Source memory accuracy was measured as the percentage of correct non-guess trials. As results, we found a significant interaction between prior exposure and repetition. Repetition impaired source memory when the items had been pre-exposed on Day 1, while it led to greater accuracy in novel ones. These results show that pre-experimental exposure can modulate the effects of repetition on associative binding between an item and its contextual information, suggesting that pre-existing representation and novelty signal interact to form new episodic memory.
Developing and testing items for the South African Personality Inventory (SAPI

Directory of Open Access Journals (Sweden)

Carin Hill

2013-11-01

Research purpose: This article reports on the process of identifying items for, and provides a quantitative evaluation of, the South African Personality Inventory (SAPI items. Motivation for the study: The study intended to develop an indigenous and psychometrically sound personality instrument that adheres to the requirements of South African legislation and excludes cultural bias. Research design, approach and method: The authors used a cross-sectional design. They measured the nine SAPI clusters identified in the qualitative stage of the SAPI project in 11 separate quantitative studies. Convenience sampling yielded 6735 participants. Statistical analysis focused on the construct validity and reliability of items. The authors eliminated items that showed poor performance, based on common psychometric criteria, and selected the best performing items to form part of the final version of the SAPI. Main findings: The authors developed 2573 items from the nine SAPI clusters. Of these, 2268 items were valid and reliable representations of the SAPI facets. Practical/managerial implications: The authors developed a large item pool. It measures personality in South Africa. Researchers can refine it for the SAPI. Furthermore, the project illustrates an approach that researchers can use in projects that aim to develop culturally-informed psychological measures. Contribution/value-add: Personality assessment is important for recruiting, selecting and developing employees. This study contributes to the current knowledge about the early processes researchers follow when they develop a personality instrument that measures personality fairly in different cultural groups, as the SAPI does.
Automated Item Generation with Recurrent Neural Networks.

Science.gov (United States)

von Davier, Matthias

2018-03-12

Utilizing technology for automated item generation is not a new idea. However, test items used in commercial testing programs or in research are still predominantly written by humans, in most cases by content experts or professional item writers. Human experts are a limited resource and testing agencies incur high costs in the process of continuous renewal of item banks to sustain testing programs. Using algorithms instead holds the promise of providing unlimited resources for this crucial part of assessment development. The approach presented here deviates in several ways from previous attempts to solve this problem. In the past, automatic item generation relied either on generating clones of narrowly defined item types such as those found in language free intelligence tests (e.g., Raven's progressive matrices) or on an extensive analysis of task components and derivation of schemata to produce items with pre-specified variability that are hoped to have predictable levels of difficulty. It is somewhat unlikely that researchers utilizing these previous approaches would look at the proposed approach with favor; however, recent applications of machine learning show success in solving tasks that seemed impossible for machines not too long ago. The proposed approach uses deep learning to implement probabilistic language models, not unlike what Google brain and Amazon Alexa use for language processing and generation.
Understanding and quantifying cognitive complexity level in mathematical problem solving items

Directory of Open Access Journals (Sweden)

SUSAN E. EMBRETSON

2008-09-01

Full Text Available The linear logistic test model (LLTM; Fischer, 1973 has been applied to a wide variety of new tests. When the LLTM application involves item complexity variables that are both theoretically interesting and empirically supported, several advantages can result. These advantages include elaborating construct validity at the item level, defining variables for test design, predicting parameters of new items, item banking by sources of complexity and providing a basis for item design and item generation. However, despite the many advantages of applying LLTM to test items, it has been applied less often to understand the sources of complexity for large-scale operational test items. Instead, previously calibrated item parameters are modeled using regression techniques because raw item response data often cannot be made available. In the current study, both LLTM and regression modeling are applied to mathematical problem solving items from a widely used test. The findings from the two methods are compared and contrasted for their implications for continued development of ability and achievement tests based on mathematical problem solving items.
Item response theory - A first approach

Science.gov (United States)

Nunes, Sandra; Oliveira, Teresa; Oliveira, Amílcar

2017-07-01

The Item Response Theory (IRT) has become one of the most popular scoring frameworks for measurement data, frequently used in computerized adaptive testing, cognitively diagnostic assessment and test equating. According to Andrade et al. (2000), IRT can be defined as a set of mathematical models (Item Response Models - IRM) constructed to represent the probability of an individual giving the right answer to an item of a particular test. The number of Item Responsible Models available to measurement analysis has increased considerably in the last fifteen years due to increasing computer power and due to a demand for accuracy and more meaningful inferences grounded in complex data. The developments in modeling with Item Response Theory were related with developments in estimation theory, most remarkably Bayesian estimation with Markov chain Monte Carlo algorithms (Patz & Junker, 1999). The popularity of Item Response Theory has also implied numerous overviews in books and journals, and many connections between IRT and other statistical estimation procedures, such as factor analysis and structural equation modeling, have been made repeatedly (Van der Lindem & Hambleton, 1997). As stated before the Item Response Theory covers a variety of measurement models, ranging from basic one-dimensional models for dichotomously and polytomously scored items and their multidimensional analogues to models that incorporate information about cognitive sub-processes which influence the overall item response process. The aim of this work is to introduce the main concepts associated with one-dimensional models of Item Response Theory, to specify the logistic models with one, two and three parameters, to discuss some properties of these models and to present the main estimation procedures.
A Case Study on an Item Writing Process: Use of Test Specifications, Nature of Group Dynamics, and Individual Item Writers' Characteristics

Science.gov (United States)

Kim, Jiyoung; Chi, Youngshin; Huensch, Amanda; Jun, Heesung; Li, Hongli; Roullion, Vanessa

2010-01-01

This article discusses a case study on an item writing process that reflects on our practical experience in an item development project. The purpose of the article is to share our lessons from the experience aiming to demystify item writing process. The study investigated three issues that naturally emerged during the project: how item writers use…
A strategy for optimizing item-pool management

NARCIS (Netherlands)

Ariel, A.; van der Linden, Willem J.; Veldkamp, Bernard P.

2006-01-01

Item-pool management requires a balancing act between the input of new items into the pool and the output of tests assembled from it. A strategy for optimizing item-pool management is presented that is based on the idea of a periodic update of an optimal blueprint for the item pool to tune item
Psychometric evaluation of an item bank for computerized adaptive testing of the EORTC QLQ-C30 cognitive functioning dimension in cancer patients.

Science.gov (United States)

Dirven, Linda; Groenvold, Mogens; Taphoorn, Martin J B; Conroy, Thierry; Tomaszewski, Krzysztof A; Young, Teresa; Petersen, Morten Aa

2017-11-01

The European Organisation of Research and Treatment of Cancer (EORTC) Quality of Life Group is developing computerized adaptive testing (CAT) versions of all EORTC Quality of Life Questionnaire (QLQ-C30) scales with the aim to enhance measurement precision. Here we present the results on the field-testing and psychometric evaluation of the item bank for cognitive functioning (CF). In previous phases (I-III), 44 candidate items were developed measuring CF in cancer patients. In phase IV, these items were psychometrically evaluated in a large sample of international cancer patients. This evaluation included an assessment of dimensionality, fit to the item response theory (IRT) model, differential item functioning (DIF), and measurement properties. A total of 1030 cancer patients completed the 44 candidate items on CF. Of these, 34 items could be included in a unidimensional IRT model, showing an acceptable fit. Although several items showed DIF, these had a negligible impact on CF estimation. Measurement precision of the item bank was much higher than the two original QLQ-C30 CF items alone, across the whole continuum. Moreover, CAT measurement may on average reduce study sample sizes with about 35-40% compared to the original QLQ-C30 CF scale, without loss of power. A CF item bank for CAT measurement consisting of 34 items was established, applicable to various cancer patients across countries. This CAT measurement system will facilitate precise and efficient assessment of HRQOL of cancer patients, without loss of comparability of results.
The role of attention in item-item binding in visual working memory.

Science.gov (United States)

Peterson, Dwight J; Naveh-Benjamin, Moshe

2017-09-01

An important yet unresolved question regarding visual working memory (VWM) relates to whether or not binding processes within VWM require additional attentional resources compared with processing solely the individual components comprising these bindings. Previous findings indicate that binding of surface features (e.g., colored shapes) within VWM is not demanding of resources beyond what is required for single features. However, it is possible that other types of binding, such as the binding of complex, distinct items (e.g., faces and scenes), in VWM may require additional resources. In 3 experiments, we examined VWM item-item binding performance under no load, articulatory suppression, and backward counting using a modified change detection task. Binding performance declined to a greater extent than single-item performance under higher compared with lower levels of concurrent load. The findings from each of these experiments indicate that processing item-item bindings within VWM requires a greater amount of attentional resources compared with single items. These findings also highlight an important distinction between the role of attention in item-item binding within VWM and previous studies of long-term memory (LTM) where declines in single-item and binding test performance are similar under divided attention. The current findings provide novel evidence that the specific type of binding is an important determining factor regarding whether or not VWM binding processes require attention. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Center for Media Literacy Unveils the CML Medialit Kit[TM]: A Free Educational Framework that Helps Students Challenge and Understand Media

Science.gov (United States)

Social Studies, 2004

2004-01-01

Five key questions form the basis of the new CML MediaLit Kit, an educational framework and curriculum guide developed by the Center for Media Literacy. Adaptable to all grades, the key questions help children and young people evaluate the thousands of media messages that bombard them daily. More than two years in development and available for…
Re-evaluating a vision-related quality of life questionnaire with item response theory (IRT and differential item functioning (DIF analyses

Directory of Open Access Journals (Sweden)

Knol Dirk L

2011-09-01

Full Text Available Abstract Background For the Low Vision Quality Of Life questionnaire (LVQOL it is unknown whether the psychometric properties are satisfactory when an item response theory (IRT perspective is considered. This study evaluates some essential psychometric properties of the LVQOL questionnaire in an IRT model, and investigates differential item functioning (DIF. Methods Cross-sectional data were used from an observational study among visually-impaired patients (n = 296. Calibration was performed for every dimension of the LVQOL in the graded response model. Item goodness-of-fit was assessed with the S-X2-test. DIF was assessed on relevant background variables (i.e. age, gender, visual acuity, eye condition, rehabilitation type and administration type with likelihood-ratio tests for DIF. The magnitude of DIF was interpreted by assessing the largest difference in expected scores between subgroups. Measurement precision was assessed by presenting test information curves; reliability with the index of subject separation. Results All items of the LVQOL dimensions fitted the model. There was significant DIF on several items. For two items the maximum difference between expected scores exceeded one point, and DIF was found on multiple relevant background variables. Item 1 'Vision in general' from the "Adjustment" dimension and item 24 'Using tools' from the "Reading and fine work" dimension were removed. Test information was highest for the "Reading and fine work" dimension. Indices for subject separation ranged from 0.83 to 0.94. Conclusions The items of the LVQOL showed satisfactory item fit to the graded response model; however, two items were removed because of DIF. The adapted LVQOL with 21 items is DIF-free and therefore seems highly appropriate for use in heterogeneous populations of visually impaired patients.
Evaluation of item candidates for a diabetic retinopathy quality of life item bank.

Science.gov (United States)

Fenwick, Eva K; Pesudovs, Konrad; Khadka, Jyoti; Rees, Gwyn; Wong, Tien Y; Lamoureux, Ecosse L

2013-09-01

We are developing an item bank assessing the impact of diabetic retinopathy (DR) on quality of life (QoL) using a rigorous multi-staged process combining qualitative and quantitative methods. We describe here the first two qualitative phases: content development and item evaluation. After a comprehensive literature review, items were generated from four sources: (1) 34 previously validated patient-reported outcome measures; (2) five published qualitative articles; (3) eight focus groups and 18 semi-structured interviews with 57 DR patients; and (4) seven semi-structured interviews with diabetes or ophthalmic experts. Items were then evaluated during 3 stages, namely binning (grouping) and winnowing (reduction) based on key criteria and panel consensus; development of item stems and response options; and pre-testing of items via cognitive interviews with patients. The content development phase yielded 1,165 unique items across 7 QoL domains. After 3 sessions of binning and winnowing, items were reduced to a minimally representative set (n = 312) across 9 domains of QoL: visual symptoms; ocular surface symptoms; activity limitation; mobility; emotional; health concerns; social; convenience; and economic. After 8 cognitive interviews, 42 items were amended resulting in a final set of 314 items. We have employed a systematic approach to develop items for a DR-specific QoL item bank. The psychometric properties of the nine QoL subscales will be assessed using Rasch analysis. The resulting validated item bank will allow clinicians and researchers to better understand the QoL impact of DR and DR therapies from the patient's perspective.
Applications of Multidimensional Item Response Theory Models with Covariates to Longitudinal Test Data. Research Report. ETS RR-16-21

Science.gov (United States)

Fu, Jianbin

2016-01-01

The multidimensional item response theory (MIRT) models with covariates proposed by Haberman and implemented in the "mirt" program provide a flexible way to analyze data based on item response theory. In this report, we discuss applications of the MIRT models with covariates to longitudinal test data to measure skill differences at the…
Nursing Faculty Decision Making about Best Practices in Test Construction, Item Analysis, and Revision

Science.gov (United States)

Killingsworth, Erin Elizabeth

2013-01-01

With the widespread use of classroom exams in nursing education there is a great need for research on current practices in nursing education regarding this form of assessment. The purpose of this study was to explore how nursing faculty members make decisions about using best practices in classroom test construction, item analysis, and revision in…
Evaluation of the box and blocks test, stereognosis and item banks of activity and upper extremity function in youths with brachial plexus birth palsy.

Science.gov (United States)

Mulcahey, Mary Jane; Kozin, Scott; Merenda, Lisa; Gaughan, John; Tian, Feng; Gogola, Gloria; James, Michelle A; Ni, Pengsheng

2012-09-01

One of the greatest limitations to measuring outcomes in pediatric orthopaedics is the lack of effective instruments. Computer adaptive testing, which uses large item banks, select only items that are relevant to a child's function based on a previous response and filters items that are too easy or too hard or simply not relevant to the child. In this way, computer adaptive testing provides for a meaningful, efficient, and precise method to evaluate patient-reported outcomes. Banks of items that assess activity and upper extremity (UE) function have been developed for children with cerebral palsy and have enabled computer adaptive tests that showed strong reliability, strong validity, and broader content range when compared with traditional instruments. Because of the void in instruments for children with brachial plexus birth palsy (BPBP) and the importance of having an UE and activity scale, we were interested in how well these items worked in this population. Cross-sectional, multicenter study involving 200 children with BPBP was conducted. The box and block test (BBT) and Stereognosis tests were administered and patient reports of UE function and activity were obtained with the cerebral palsy item banks. Differential item functioning (DIF) was examined. Predictive ability of the BBT and stereognosis was evaluated with proportional odds logistic regression model. Spearman correlations coefficients (rs) were calculated to examine correlation between stereognosis and the BBT and between individual stereognosis items and the total stereognosis score. Six of the 86 items showed DIF, indicating that the activity and UE item banks may be useful for computer adaptive tests for children with BPBP. The penny and the button were strongest predictors of impairment level (odds ratio=0.34 to 0.40]. There was a good positive relationship between total stereognosis and BBT scores (rs=0.60). The BBT had a good negative (rs=-0.55) and good positive (rs=0.55) relationship with
Developing a Numerical Ability Test for Students of Education in Jordan: An Application of Item Response Theory

Science.gov (United States)

Abed, Eman Rasmi; Al-Absi, Mohammad Mustafa; Abu shindi, Yousef Abdelqader

2016-01-01

The purpose of the present study is developing a test to measure the numerical ability for students of education. The sample of the study consisted of (504) students from 8 universities in Jordan. The final draft of the test contains 45 items distributed among 5 dimensions. The results revealed that acceptable psychometric properties of the test;…
Three controversies over item disclosure in medical licensure examinations

Directory of Open Access Journals (Sweden)

Yoon Soo Park

2015-09-01

Full Text Available In response to views on public's right to know, there is growing attention to item disclosure – release of items, answer keys, and performance data to the public – in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations – 1 fairness and validity, 2 impact on passing levels, and 3 utility of item disclosure – by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers’ right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration.
Development of a Mechanical Engineering Test Item Bank to promote learning outcomes-based education in Japanese and Indonesian higher education institutions

Directory of Open Access Journals (Sweden)

Jeffrey S. Cross

2017-11-01

Full Text Available Following on the 2008-2012 OECD Assessment of Higher Education Learning Outcomes (AHELO feasibility study of civil engineering, in Japan a mechanical engineering learning outcomes assessment working group was established within the National Institute of Education Research (NIER, which became the Tuning National Center for Japan. The purpose of the project is to develop among engineering faculty members, common understandings of engineering learning outcomes, through the collaborative process of test item development, scoring, and sharing of results. By substantiating abstract level learning outcomes into concrete level learning outcomes that are attainable and assessable, and through measuring and comparing the students’ achievement of learning outcomes, it is anticipated that faculty members will be able to draw practical implications for educational improvement at the program and course levels. The development of a mechanical engineering test item bank began with test item development workshops, which led to a series of trial tests, and then to a large scale test implementation in 2016 of 348 first semester master’s students in 9 institutions in Japan, using both multiple choice questions designed to measure the mastery of basic and engineering sciences, and a constructive response task designed to measure “how well students can think like an engineer.” The same set of test items were translated from Japanese into to English and Indonesian, and used to measure achievement of learning outcomes at Indonesia’s Institut Teknologi Bandung (ITB on 37 rising fourth year undergraduate students. This paper highlights how learning outcomes assessment can effectively facilitate learning outcomes-based education, by documenting the experience of Japanese and Indonesian mechanical engineering faculty members engaged in the NIER Test Item Bank project.First published online: 30 November 2017
Concreteness effects in short-term memory: a test of the item-order hypothesis.

Science.gov (United States)

Roche, Jaclynn; Tolan, G Anne; Tehan, Gerald

2011-12-01

The following experiments explore word length and concreteness effects in short-term memory within an item-order processing framework. This framework asserts order memory is better for those items that are relatively easy to process at the item level. However, words that are difficult to process benefit at the item level for increased attention/resources being applied. The prediction of the model is that differential item and order processing can be detected in episodic tasks that differ in the degree to which item or order memory are required by the task. The item-order account has been applied to the word length effect such that there is a short word advantage in serial recall but a long word advantage in item recognition. The current experiment considered the possibility that concreteness effects might be explained within the same framework. In two experiments, word length (Experiment 1) and concreteness (Experiment 2) are examined using forward serial recall, backward serial recall, and item recognition. These results for word length replicate previous studies showing the dissociation in item and order tasks. The same was not true for the concreteness effect. In all three tasks concrete words were better remembered than abstract words. The concreteness effect cannot be explained in terms of an item-order trade off. PsycINFO Database Record (c) 2011 APA, all rights reserved.

Tailored Cloze: Improved with Classical Item Analysis Techniques.

Science.gov (United States)

Brown, James Dean

1988-01-01

The reliability and validity of a cloze procedure used as an English-as-a-second-language (ESL) test in China were improved by applying traditional item analysis and selection techniques. The 'best' test items were chosen on the basis of item facility and discrimination indices, and were administered as a 'tailored cloze.' 29 references listed.…
Examination of the PROMIS upper extremity item bank.

Science.gov (United States)

Hung, Man; Voss, Maren W; Bounsanga, Jerry; Crum, Anthony B; Tyser, Andrew R

Clinical measurement. The psychometric properties of the PROMIS v1.2 UE item bank were tested on various samples prior to its release, but have not been fully evaluated among the orthopaedic population. This study assesses the performance of the UE item bank within the UE orthopaedic patient population. The UE item bank was administered to 1197 adult patients presenting to a tertiary orthopaedic clinic specializing in hand and UE conditions and was examined using traditional statistics and Rasch analysis. The UE item bank fits a unidimensional model (outfit MNSQ range from 0.64 to 1.70) and has adequate reliabilities (person = 0.84; item = 0.82) and local independence (item residual correlations range from -0.37 to 0.34). Only one item exhibits gender differential item functioning. Most items target low levels of function. The UE item bank is a useful clinical assessment tool. Additional items covering higher functions are needed to enhance validity. Supplemental testing is recommended for patients at higher levels of function until more high function UE items are developed. 2c. Copyright © 2016 Hanley & Belfus. Published by Elsevier Inc. All rights reserved.
De item-reeks van de cognitieve screening test vergeleken met die van de mini-mental state examination

NARCIS (Netherlands)

Schmand, B.; Deelman, B. G.; Hooijer, C.; Jonker, C.; Lindeboom, J.

1996-01-01

The items of the ¿mini-mental state examination' (MMSE) and a Dutch dementia screening instrument, the ¿cognitive screening test' (CST), as well as the ¿geriatric mental status schedule' (GMS) and the ¿Dutch adult reading test' (DART), were administered to 4051 elderly people aged 65 to 84 years.
Detection of advance item knowledge using response times in computer adaptive testing

NARCIS (Netherlands)

Meijer, R.R.; Sotaridona, Leonardo

2006-01-01

We propose a new method for detecting item preknowledge in a CAT based on an estimate of “effective response time” for each item. Effective response time is defined as the time required for an individual examinee to answer an item correctly. An unusually short response time relative to the expected
Language-related differential item functioning between English and German PROMIS Depression items is negligible.

Science.gov (United States)

Fischer, H Felix; Wahl, Inka; Nolte, Sandra; Liegl, Gregor; Brähler, Elmar; Löwe, Bernd; Rose, Matthias

2017-12-01

To investigate differential item functioning (DIF) of PROMIS Depression items between US and German samples we compared data from the US PROMIS calibration sample (n = 780), a German general population survey (n = 2,500) and a German clinical sample (n = 621). DIF was assessed in an ordinal logistic regression framework, with 0.02 as criterion for R 2 -change and 0.096 for Raju's non-compensatory DIF. Item parameters were initially fixed to the PROMIS Depression metric; we used plausible values to account for uncertainty in depression estimates. Only four items showed DIF. Accounting for DIF led to negligible effects for the full item bank as well as a post hoc simulated computer-adaptive test (German general population sample was considerably lower compared to the US reference value of 50. Overall, we found little evidence for language DIF between US and German samples, which could be addressed by either replacing the DIF items by items not showing DIF or by scoring the short form in German samples with the corrected item parameters reported. Copyright © 2016 John Wiley & Sons, Ltd.
Applying Item Response Theory to the Development of a Screening Adaptation of the Goldman-Fristoe Test of Articulation-Second Edition

Science.gov (United States)

Brackenbury, Tim; Zickar, Michael J.; Munson, Benjamin; Storkel, Holly L.

2017-01-01

Purpose: Item response theory (IRT) is a psychometric approach to measurement that uses latent trait abilities (e.g., speech sound production skills) to model performance on individual items that vary by difficulty and discrimination. An IRT analysis was applied to preschoolers' productions of the words on the Goldman-Fristoe Test of…
A unified factor-analytic approach to the detection of item and test bias: Illustration with the effect of providing calculators to students with dyscalculia

Directory of Open Access Journals (Sweden)

Lee, M. K.

2016-01-01

Full Text Available An absence of measurement bias against distinct groups is a prerequisite for the use of a given psychological instrument in scientific research or high-stakes assessment. Factor analysis is the framework explicitly adopted for the identification of such bias when the instrument consists of a multi-test battery, whereas item response theory is employed when the focus narrows to a single test composed of discrete items. Item response theory can be treated as a mild nonlinearization of the standard factor model, and thus the essential unity of bias detection at the two levels merits greater recognition. Here we illustrate the benefits of a unified approach with a real-data example, which comes from a statewide test of mathematics achievement where examinees diagnosed with dyscalculia were accommodated with calculators. We found that items that can be solved by explicit arithmetical computation became easier for the accommodated examinees, but the quantitative magnitude of this differential item functioning (measurement bias was small.
An Investigation of Item Type in a Standards-Based Assessment.

Directory of Open Access Journals (Sweden)

Liz Hollingworth

2007-12-01

Full Text Available Large-scale state assessment programs use both multiple-choice and open-ended items on tests for accountability purposes. Certainly, there is an intuitive belief among some educators and policy makers that open-ended items measure something different than multiple-choice items. This study examined two item formats in custom-built, standards-based tests of achievement in Reading and Mathematics at grades 3-8. In this paper, we raise questions about the value of including open-ended items, given scoring costs, time constraints, and the higher probability of missing data from test-takers.
The Linear Logistic Test Model (LLTM as the methodological foundation of item generating rules for a new verbal reasoning test

Directory of Open Access Journals (Sweden)

HERBERT POINSTINGL

2009-06-01

Full Text Available Based on the demand for new verbal reasoning tests to enrich psychological test inventory, a pilot version of a new test was analysed: the 'Family Relation Reasoning Test' (FRRT; Poinstingl, Kubinger, Skoda & Schechtner, forthcoming, in which several basic cognitive operations (logical rules have been embedded/implemented. Given family relationships of varying complexity embedded in short stories, testees had to logically conclude the correct relationship between two individuals within a family. Using empirical data, the linear logistic test model (LLTM; Fischer, 1972, a special case of the Rasch model, was used to test the construct validity of the test: The hypothetically assumed basic cognitive operations had to explain the Rasch model's item difficulty parameters. After being shaped in LLTM's matrices of weights ((qij, none of these operations were corroborated by means of the Andersen's Likelihood Ratio Test.
An evaluation of computerized adaptive testing for general psychological distress: combining GHQ-12 and Affectometer-2 in an item bank for public mental health research.

Science.gov (United States)

Stochl, Jan; Böhnke, Jan R; Pickett, Kate E; Croudace, Tim J

2016-05-20

Recent developments in psychometric modeling and technology allow pooling well-validated items from existing instruments into larger item banks and their deployment through methods of computerized adaptive testing (CAT). Use of item response theory-based bifactor methods and integrative data analysis overcomes barriers in cross-instrument comparison. This paper presents the joint calibration of an item bank for researchers keen to investigate population variations in general psychological distress (GPD). Multidimensional item response theory was used on existing health survey data from the Scottish Health Education Population Survey (n = 766) to calibrate an item bank consisting of pooled items from the short common mental disorder screen (GHQ-12) and the Affectometer-2 (a measure of "general happiness"). Computer simulation was used to evaluate usefulness and efficacy of its adaptive administration. A bifactor model capturing variation across a continuum of population distress (while controlling for artefacts due to item wording) was supported. The numbers of items for different required reliabilities in adaptive administration demonstrated promising efficacy of the proposed item bank. Psychometric modeling of the common dimension captured by more than one instrument offers the potential of adaptive testing for GPD using individually sequenced combinations of existing survey items. The potential for linking other item sets with alternative candidate measures of positive mental health is discussed since an optimal item bank may require even more items than these.
Development of an item bank for the EORTC Role Functioning Computer Adaptive Test (EORTC RF-CAT)

DEFF Research Database (Denmark)

Gamper, Eva-Maria; Petersen, Morten Aa.; Aaronson, Neil

2016-01-01

a computer-adaptive test (CAT) for RF. This was part of a larger project whose objective is to develop a CAT version of the EORTC QLQ-C30 which is one of the most widely used HRQOL instruments in oncology. METHODS: In accordance with EORTC guidelines, the development of the RF-CAT comprised four phases...... with good psychometric properties. The resulting item bank exhibits excellent reliability (mean reliability = 0.85, median = 0.95). Using the RF-CAT may allow sample size savings from 11 % up to 50 % compared to using the QLQ-C30 RF scale. CONCLUSIONS: The RF-CAT item bank improves the precision...
A study on stability and medical implications for a complex delay model for CML with cell competition and treatment.

Science.gov (United States)

Rădulescu, I R; Cândea, D; Halanay, A

2014-12-21

We study a mathematical model describing the dynamics of leukemic and normal cell populations (stem-like and differentiated) in chronic myeloid leukemia (CML). This model is a system of four delay differential equations incorporating three types of cell division. The competition between normal and leukemic stem cell populations for the common microenvironment is taken into consideration. The stability of one steady state is investigated. The results are discussed via their medical interpretation. Copyright © 2014 Elsevier Ltd. All rights reserved.
Higher-Order Asymptotics and Its Application to Testing the Equality of the Examinee Ability Over Two Sets of Items.

Science.gov (United States)

Sinharay, Sandip; Jensen, Jens Ledet

2018-06-27

In educational and psychological measurement, researchers and/or practitioners are often interested in examining whether the ability of an examinee is the same over two sets of items. Such problems can arise in measurement of change, detection of cheating on unproctored tests, erasure analysis, detection of item preknowledge, etc. Traditional frequentist approaches that are used in such problems include the Wald test, the likelihood ratio test, and the score test (e.g., Fischer, Appl Psychol Meas 27:3-26, 2003; Finkelman, Weiss, & Kim-Kang, Appl Psychol Meas 34:238-254, 2010; Glas & Dagohoy, Psychometrika 72:159-180, 2007; Guo & Drasgow, Int J Sel Assess 18:351-364, 2010; Klauer & Rettig, Br J Math Stat Psychol 43:193-206, 1990; Sinharay, J Educ Behav Stat 42:46-68, 2017). This paper shows that approaches based on higher-order asymptotics (e.g., Barndorff-Nielsen & Cox, Inference and asymptotics. Springer, London, 1994; Ghosh, Higher order asymptotics. Institute of Mathematical Statistics, Hayward, 1994) can also be used to test for the equality of the examinee ability over two sets of items. The modified signed likelihood ratio test (e.g., Barndorff-Nielsen, Biometrika 73:307-322, 1986) and the Lugannani-Rice approximation (Lugannani & Rice, Adv Appl Prob 12:475-490, 1980), both of which are based on higher-order asymptotics, are shown to provide some improvement over the traditional frequentist approaches in three simulations. Two real data examples are also provided.
Psychometric evaluation of an item bank for computerized adaptive testing of the EORTC QLQ-C30 cognitive functioning dimension in cancer patients

DEFF Research Database (Denmark)

Dirven, Linda; Groenvold, Mogens; Taphoorn, Martin J. B.

2017-01-01

on the field-testing and psychometric evaluation of the item bank for cognitive functioning (CF). METHODS: In previous phases (I-III), 44 candidate items were developed measuring CF in cancer patients. In phase IV, these items were psychometrically evaluated in a large sample of international cancer patients...... model, showing an acceptable fit. Although several items showed DIF, these had a negligible impact on CF estimation. Measurement precision of the item bank was much higher than the two original QLQ-C30 CF items alone, across the whole continuum. Moreover, CAT measurement may on average reduce study...... sample sizes with about 35-40% compared to the original QLQ-C30 CF scale, without loss of power. CONCLUSION: A CF item bank for CAT measurement consisting of 34 items was established, applicable to various cancer patients across countries. This CAT measurement system will facilitate precise and efficient...
Threats to Validity When Using Open-Ended Items in International Achievement Studies: Coding Responses to the PISA 2012 Problem-Solving Test in Finland

Science.gov (United States)

Arffman, Inga

2016-01-01

Open-ended (OE) items are widely used to gather data on student performance in international achievement studies. However, several factors may threaten validity when using such items. This study examined Finnish coders' opinions about threats to validity when coding responses to OE items in the PISA 2012 problem-solving test. A total of 6…
Re-Fitting for a Different Purpose: A Case Study of Item Writer Practices in Adapting Source Texts for a Test of Academic Reading

Science.gov (United States)

Green, Anthony; Hawkey, Roger

2012-01-01

The important yet under-researched role of item writers in the selection and adaptation of texts for high-stakes reading tests is investigated through a case study involving a group of trained item writers working on the International English Language Testing System (IELTS). In the first phase of the study, participants were invited to reflect in…
easyCBM CCSS Math Item Scaling and Test Form Revision (2012-2013): Grades 6-8. Technical Report #1313

Science.gov (United States)

Anderson, Daniel; Alonzo, Julie; Tindal, Gerald

2012-01-01

The purpose of this technical report is to document the piloting and scaling of new easyCBM mathematics test items aligned with the Common Core State Standards (CCSS) and to describe the process used to revise and supplement the 2012 research version easyCBM CCSS math tests in Grades 6-8. For all operational 2012 research version test forms (10…
A Balance Sheet for Educational Item Banking.

Science.gov (United States)

Hiscox, Michael D.

Educational item banking presents observers with a considerable paradox. The development of test items from scratch is viewed as wasteful, a luxury in times of declining resources. On the other hand, item banking has failed to become a mature technology despite large amounts of money and the efforts of talented professionals. The question of which…
Sensitivity and specificity of the 3-item memory test in the assessment of post traumatic amnesia.

NARCIS (Netherlands)

Andriessen, T.M.J.C.; Jong, B. de; Jacobs, B.; Werf, S.P. van der; Vos, P.E.

2009-01-01

PRIMARY OBJECTIVE: To investigate how the type of stimulus (pictures or words) and the method of reproduction (free recall or recognition after a short or a long delay) affect the sensitivity and specificity of a 3-item memory test in the assessment of post traumatic amnesia (PTA). METHODS: Daily
Three Modeling Applications to Promote Automatic Item Generation for Examinations in Dentistry.

Science.gov (United States)

Lai, Hollis; Gierl, Mark J; Byrne, B Ellen; Spielman, Andrew I; Waldschmidt, David M

2016-03-01

Test items created for dentistry examinations are often individually written by content experts. This approach to item development is expensive because it requires the time and effort of many content experts but yields relatively few items. The aim of this study was to describe and illustrate how items can be generated using a systematic approach. Automatic item generation (AIG) is an alternative method that allows a small number of content experts to produce large numbers of items by integrating their domain expertise with computer technology. This article describes and illustrates how three modeling approaches to item content-item cloning, cognitive modeling, and image-anchored modeling-can be used to generate large numbers of multiple-choice test items for examinations in dentistry. Test items can be generated by combining the expertise of two content specialists with technology supported by AIG. A total of 5,467 new items were created during this study. From substitution of item content, to modeling appropriate responses based upon a cognitive model of correct responses, to generating items linked to specific graphical findings, AIG has the potential for meeting increasing demands for test items. Further, the methods described in this study can be generalized and applied to many other item types. Future research applications for AIG in dental education are discussed.

Projective Item Response Model for Test-Independent Measurement

Science.gov (United States)

Ip, Edward Hak-Sing; Chen, Shyh-Huei

2012-01-01

The problem of fitting unidimensional item-response models to potentially multidimensional data has been extensively studied. The focus of this article is on response data that contains a major dimension of interest but that may also contain minor nuisance dimensions. Because fitting a unidimensional model to multidimensional data results in…
Matrix Sampling of Items in Large-Scale Assessments

Directory of Open Access Journals (Sweden)

Ruth A. Childs

2003-07-01

Full Text Available Matrix sampling of items -' that is, division of a set of items into different versions of a test form..-' is used by several large-scale testing programs. Like other test designs, matrixed designs have..both advantages and disadvantages. For example, testing time per student is less than if each..student received all the items, but the comparability of student scores may decrease. Also,..curriculum coverage is maintained, but reporting of scores becomes more complex. In this paper,..matrixed designs are compared with more traditional designs in nine categories of costs:..development costs, materials costs, administration costs, educational costs, scoring costs,..reliability costs, comparability costs, validity costs, and reporting costs. In choosing among test..designs, a testing program should examine the costs in light of its mandate(s, the content of the..tests, and the financial resources available, among other considerations.
Investigating the Impact of Item Parameter Drift for Item Response Theory Models with Mixture Distributions

Directory of Open Access Journals (Sweden)

Yoon Soo ePark

2016-02-01

Full Text Available This study investigates the impact of item parameter drift (IPD on parameter and ability estimation when the underlying measurement model fits a mixture distribution, thereby violating the item invariance property of unidimensional item response theory (IRT models. An empirical study was conducted to demonstrate the occurrence of both IPD and an underlying mixture distribution using real-world data. Twenty-one trended anchor items from the 1999, 2003, and 2007 administrations of Trends in International Mathematics and Science Study (TIMSS were analyzed using unidimensional and mixture IRT models. TIMSS treats trended anchor items as invariant over testing administrations and uses pre-calibrated item parameters based on unidimensional IRT. However, empirical results showed evidence of two latent subgroups with IPD. Results showed changes in the distribution of examinee ability between latent classes over the three administrations. A simulation study was conducted to examine the impact of IPD on the estimation of ability and item parameters, when data have underlying mixture distributions. Simulations used data generated from a mixture IRT model and estimated using unidimensional IRT. Results showed that data reflecting IPD using mixture IRT model led to IPD in the unidimensional IRT model. Changes in the distribution of examinee ability also affected item parameters. Moreover, drift with respect to item discrimination and distribution of examinee ability affected estimates of examinee ability. These findings demonstrate the need to caution and evaluate IPD using a mixture IRT framework to understand its effect on item parameters and examinee ability.
Investigating the Impact of Item Parameter Drift for Item Response Theory Models with Mixture Distributions.

Science.gov (United States)

Park, Yoon Soo; Lee, Young-Sun; Xing, Kuan

2016-01-01

This study investigates the impact of item parameter drift (IPD) on parameter and ability estimation when the underlying measurement model fits a mixture distribution, thereby violating the item invariance property of unidimensional item response theory (IRT) models. An empirical study was conducted to demonstrate the occurrence of both IPD and an underlying mixture distribution using real-world data. Twenty-one trended anchor items from the 1999, 2003, and 2007 administrations of Trends in International Mathematics and Science Study (TIMSS) were analyzed using unidimensional and mixture IRT models. TIMSS treats trended anchor items as invariant over testing administrations and uses pre-calibrated item parameters based on unidimensional IRT. However, empirical results showed evidence of two latent subgroups with IPD. Results also showed changes in the distribution of examinee ability between latent classes over the three administrations. A simulation study was conducted to examine the impact of IPD on the estimation of ability and item parameters, when data have underlying mixture distributions. Simulations used data generated from a mixture IRT model and estimated using unidimensional IRT. Results showed that data reflecting IPD using mixture IRT model led to IPD in the unidimensional IRT model. Changes in the distribution of examinee ability also affected item parameters. Moreover, drift with respect to item discrimination and distribution of examinee ability affected estimates of examinee ability. These findings demonstrate the need to caution and evaluate IPD using a mixture IRT framework to understand its effects on item parameters and examinee ability.
Item analysis and evaluation in the examinations in the faculty of ...

African Journals Online (AJOL)

2014-11-05

Nov 5, 2014 ... Key words: Classical test theory, item analysis, item difficulty, item discrimination, item response theory, reliability ... the probability of answering an item correctly or of attaining ..... A Monte Carlo comparison of item and person.
Compreensão da leitura: análise do funcionamento diferencial dos itens de um Teste de Cloze Reading comprehension: differential item functioning analysis of a Cloze Test

Directory of Open Access Journals (Sweden)

Katya Luciane Oliveira

2012-01-01

Full Text Available Este estudo teve por objetivos investigar o ajuste de um Teste de Cloze ao modelo Rasch e avaliar a dificuldade na resposta ao item em razão do gênero das pessoas (DIF. Participaram da pesquisa 573 alunos das 5ª a 8ª séries do ensino fundamental de escolas públicas estaduais dos estados de São Paulo e Minas Gerais. O teste de Cloze foi aplicado de forma coletiva. A análise do instrumento evidenciou um bom ajuste ao modelo Rasch, bem como os itens foram respondidos conforme o padrão esperado, demonstrando um bom ajuste, também. Quanto ao DIF, apenas três itens indicaram diferenciar o gênero. Com base nos dados, identificou-se que houve equilíbrio nas respostas dadas pelos meninos e meninas.The objectives of the present study were to investigate the adaptation of a Cloze test to the Rasch Model as well as to evaluate the Differential Item Functioning (DIF in relation to gender. The sample was composed by 573 students from 5th to 8th grades of public schools in the state of São Paulo. The cloze test was applied collectively. The analysis of the instrument revealed its adaptation to Rash Model and that the items were responded according to the expected pattern, showing good adjustment, as well. Regarding DIF, only three items were differentiated by gender. Based on the data, results indicated a balance in the answers given by boys and girls.
Building an Evaluation Scale using Item Response Theory.

Science.gov (United States)

Lalor, John P; Wu, Hao; Yu, Hong

2016-11-01

Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.
Intracellular Retention of ABL Kinase Inhibitors Determines Commitment to Apoptosis in CML Cells

Science.gov (United States)

Dziadosz, Marek; Schnöder, Tina; Heidel, Florian; Schemionek, Mirle; Melo, Junia V.; Kindler, Thomas; Müller-Tidow, Carsten; Koschmieder, Steffen; Fischer, Thomas

2012-01-01

Clinical development of imatinib in CML established continuous target inhibition as a paradigm for successful tyrosine kinase inhibitor (TKI) therapy. However, recent reports suggested that transient potent target inhibition of BCR-ABL by high-dose TKI (HD-TKI) pulse-exposure is sufficient to irreversibly commit cells to apoptosis. Here, we report a novel mechanism of prolonged intracellular TKI activity upon HD-TKI pulse-exposure (imatinib, dasatinib) in BCR-ABL-positive cells. Comprehensive mechanistic exploration revealed dramatic intracellular accumulation of TKIs which closely correlated with induction of apoptosis. Cells were rescued from apoptosis upon HD-TKI pulse either by repetitive drug wash-out or by overexpression of ABC-family drug transporters. Inhibition of ABCB1 restored sensitivity to HD-TKI pulse-exposure. Thus, our data provide evidence that intracellular drug retention crucially determines biological activity of imatinib and dasatinib. These studies may refine our current thinking on critical requirements of TKI dose and duration of target inhibition for biological activity of TKIs. PMID:22815843
Evaluating the quality of medical multiple-choice items created with automated processes.

Science.gov (United States)

Gierl, Mark J; Lai, Hollis

2013-07-01

Computerised assessment raises formidable challenges because it requires large numbers of test items. Automatic item generation (AIG) can help address this test development problem because it yields large numbers of new items both quickly and efficiently. To date, however, the quality of the items produced using a generative approach has not been evaluated. The purpose of this study was to determine whether automatic processes yield items that meet standards of quality that are appropriate for medical testing. Quality was evaluated firstly by subjecting items created using both AIG and traditional processes to rating by a four-member expert medical panel using indicators of multiple-choice item quality, and secondly by asking the panellists to identify which items were developed using AIG in a blind review. Fifteen items from the domain of therapeutics were created in three different experimental test development conditions. The first 15 items were created by content specialists using traditional test development methods (Group 1 Traditional). The second 15 items were created by the same content specialists using AIG methods (Group 1 AIG). The third 15 items were created by a new group of content specialists using traditional methods (Group 2 Traditional). These 45 items were then evaluated for quality by a four-member panel of medical experts and were subsequently categorised as either Traditional or AIG items. Three outcomes were reported: (i) the items produced using traditional and AIG processes were comparable on seven of eight indicators of multiple-choice item quality; (ii) AIG items can be differentiated from Traditional items by the quality of their distractors, and (iii) the overall predictive accuracy of the four expert medical panellists was 42%. Items generated by AIG methods are, for the most part, equivalent to traditionally developed items from the perspective of expert medical reviewers. While the AIG method produced comparatively fewer plausible
A scale purification procedure for evaluation of differential item functioning

NARCIS (Netherlands)

Khalid, Muhammad Naveed; Glas, Cornelis A.W.

2014-01-01

Item bias or differential item functioning (DIF) has an important impact on the fairness of psychological and educational testing. In this paper, DIF is seen as a lack of fit to an item response (IRT) model. Inferences about the presence and importance of DIF require a process of so-called test
Identifying predictors of physics item difficulty: A linear regression approach

Science.gov (United States)

Mesic, Vanes; Muratovic, Hasnija

2011-06-01

Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge
Identifying predictors of physics item difficulty: A linear regression approach

Directory of Open Access Journals (Sweden)

Hasnija Muratovic

2011-06-01

Full Text Available Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal
Verification of Differential Item Functioning (DIF) Status of West ...

African Journals Online (AJOL)

This study investigated test item bias and Differential Item Functioning (DIF) of West African ... items in chemistry function differentially with respect to gender and location. In Aba education zone of Abia, 50 secondary schools were purposively ...
Analyzing force concept inventory with item response theory

Science.gov (United States)

Wang, Jing; Bao, Lei

2010-10-01

Item response theory is a popular assessment method used in education. It rests on the assumption of a probability framework that relates students' innate ability and their performance on test questions. Item response theory transforms students' raw test scores into a scaled proficiency score, which can be used to compare results obtained with different test questions. The scaled score also addresses the issues of ceiling effects and guessing, which commonly exist in quantitative assessment. We used item response theory to analyze the force concept inventory (FCI). Our results show that item response theory can be useful for analyzing physics concept surveys such as the FCI and produces results about the individual questions and student performance that are beyond the capability of classical statistics. The theory yields detailed measurement parameters regarding the difficulty, discrimination features, and probability of correct guess for each of the FCI questions.
Constructing the 32-item Fitness-to-Drive Screening Measure.

Science.gov (United States)

Medhizadah, Shabnam; Classen, Sherrilene; Johnson, Andrew M

2018-04-01

The Fitness-to-Drive Screening Measure © (FTDS) enables proxies to identify at-risk older drivers via 54 driving-related items, but may be too lengthy for widespread uptake. We reduced the number of items in the FTDS and validated the shorter measure, using 200 caregiver responses. Exploratory factor analysis and classical test theory techniques were used to determine the most interpretable factor model and the minimum number of items to be used for predicting fitness to drive. The extent to which the shorter FTDS predicted the results of the 54-item FTDS was evaluated through correlational analysis. A three-factor model best represented the empirical data. Classical test theory techniques lead to the development of the 32-item FTDS. The 32-item FTDS was highly correlated ( r = .99, p = .05) with the FTDS. The 32-item FTDS may provide raters with a faster and more efficient way to identify at-risk older drivers.
Item Modeling Concept Based on Multimedia Authoring

Directory of Open Access Journals (Sweden)

Janez Stergar

2008-09-01

Full Text Available In this paper a modern item design framework for computer based assessment based on Flash authoring environment will be introduced. Question design will be discussed as well as the multimedia authoring environment used for item modeling emphasized. Item type templates are a structured means of collecting and storing item information that can be used to improve the efficiency and security of the innovative item design process. Templates can modernize the item design, enhance and speed up the development process. Along with content creation, multimedia has vast potential for use in innovative testing. The introduced item design template is based on taxonomy of innovative items which have great potential for expanding the content areas and construct coverage of an assessment. The presented item design approach is based on GUI's – one for question design based on implemented item design templates and one for user interaction tracking/retrieval. The concept of user interfaces based on Flash technology will be discussed as well as implementation of the innovative approach of the item design forms with multimedia authoring. Also an innovative method for user interaction storage/retrieval based on PHP extending Flash capabilities in the proposed framework will be introduced.
Utilizing Response Time Distributions for Item Selection in CAT

Science.gov (United States)

Fan, Zhewen; Wang, Chun; Chang, Hua-Hua; Douglas, Jeffrey

2012-01-01

Traditional methods for item selection in computerized adaptive testing only focus on item information without taking into consideration the time required to answer an item. As a result, some examinees may receive a set of items that take a very long time to finish, and information is not accrued as efficiently as possible. The authors propose two…
Biological Science: An Ecological Approach. BSCS Green Version. Teacher's Resource Book and Test Item Bank. Sixth Edition.

Science.gov (United States)

Biological Sciences Curriculum Study, Colorado Springs.

This book consists of four sections: (1) "Supplemental Materials"; (2) "Supplemental Investigations"; (3) "Test Item Bank"; and (4) "Blackline Masters." The first section provides additional background material related to selected chapters and investigations in the student book. Included are a periodic table of the elements, genetics problems and…
Nickel and cobalt release from jewellery and metal clothing items in Korea.

Science.gov (United States)

Cheong, Seung Hyun; Choi, You Won; Choi, Hae Young; Byun, Ji Yeon

2014-01-01

In Korea, the prevalence of nickel allergy has shown a sharply increasing trend. Cobalt contact allergy is often associated with concomitant reactions to nickel, and is more common in Korea than in western countries. The aim of the present study was to investigate the prevalence of items that release nickel and cobalt on the Korean market. A total of 471 items that included 193 branded jewellery, 202 non-branded jewellery and 76 metal clothing items were sampled and studied with a dimethylglyoxime (DMG) test and a cobalt spot test to detect nickel and cobalt release, respectively. Nickel release was detected in 47.8% of the tested items. The positive rates in the DMG test were 12.4% for the branded jewellery, 70.8% for the non-branded jewellery, and 76.3% for the metal clothing items. Cobalt release was found in 6.2% of items. Among the types of jewellery, belts and hair pins showed higher positive rates in both the DMG test and the cobalt spot test. Our study shows that the prevalence of items that release nickel or cobalt among jewellery and metal clothing items is high in Korea. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Identification of metallic items that caused nickel dermatitis in Danish patients.

Science.gov (United States)

Thyssen, Jacob P; Menné, Torkil; Johansen, Jeanne D

2010-09-01

Nickel allergy is prevalent as assessed by epidemiological studies. In an attempt to further identify and characterize sources that may result in nickel allergy and dermatitis, we analysed items identified by nickel-allergic dermatitis patients as causative of nickel dermatitis by using the dimethylglyoxime (DMG) test. Dermatitis patients with nickel allergy of current relevance were identified over a 2-year period in a tertiary referral patch test centre. When possible, their work tools and personal items were examined with the DMG test. Among 95 nickel-allergic dermatitis patients, 70 (73.7%) had metallic items investigated for nickel release. A total of 151 items were investigated, and 66 (43.7%) gave positive DMG test reactions. Objects were nearly all purchased or acquired after the introduction of the EU Nickel Directive. Only one object had been inherited, and only two objects had been purchased outside of Denmark. DMG testing is valuable as a screening test for nickel release and should be used to identify relevant exposures in nickel-allergic patients. Mainly consumer items, but also work tools used in an occupational setting, released nickel in dermatitis patients. This study confirmed 'risk items' from previous studies, including mobile phones.

Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating

Science.gov (United States)

He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei

2013-01-01

Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
What Do You Think You Are Measuring? A Mixed-Methods Procedure for Assessing the Content Validity of Test Items and Theory-Based Scaling

Science.gov (United States)

Koller, Ingrid; Levenson, Michael R.; Glück, Judith

2017-01-01

The valid measurement of latent constructs is crucial for psychological research. Here, we present a mixed-methods procedure for improving the precision of construct definitions, determining the content validity of items, evaluating the representativeness of items for the target construct, generating test items, and analyzing items on a theoretical basis. To illustrate the mixed-methods content-scaling-structure (CSS) procedure, we analyze the Adult Self-Transcendence Inventory, a self-report measure of wisdom (ASTI, Levenson et al., 2005). A content-validity analysis of the ASTI items was used as the basis of psychometric analyses using multidimensional item response models (N = 1215). We found that the new procedure produced important suggestions concerning five subdimensions of the ASTI that were not identifiable using exploratory methods. The study shows that the application of the suggested procedure leads to a deeper understanding of latent constructs. It also demonstrates the advantages of theory-based item analysis. PMID:28270777
Measurement Properties of Two Innovative Item Formats in a Computer-Based Test

Science.gov (United States)

Wan, Lei; Henly, George A.

2012-01-01

Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats--the figural response (FR) and constructed response (CR) formats used in a K-12 computerized…
Validity and Reliability of the 8-Item Work Limitations Questionnaire.

Science.gov (United States)

Walker, Timothy J; Tullar, Jessica M; Diamond, Pamela M; Kohl, Harold W; Amick, Benjamin C

2017-12-01

Purpose To evaluate factorial validity, scale reliability, test-retest reliability, convergent validity, and discriminant validity of the 8-item Work Limitations Questionnaire (WLQ) among employees from a public university system. Methods A secondary analysis using de-identified data from employees who completed an annual Health Assessment between the years 2009-2015 tested research aims. Confirmatory factor analysis (CFA) (n = 10,165) tested the latent structure of the 8-item WLQ. Scale reliability was determined using a CFA-based approach while test-retest reliability was determined using the intraclass correlation coefficient. Convergent/discriminant validity was tested by evaluating relations between the 8-item WLQ with health/performance variables for convergent validity (health-related work performance, number of chronic conditions, and general health) and demographic variables for discriminant validity (gender and institution type). Results A 1-factor model with three correlated residuals demonstrated excellent model fit (CFI = 0.99, TLI = 0.99, RMSEA = 0.03, and SRMR = 0.01). The scale reliability was acceptable (0.69, 95% CI 0.68-0.70) and the test-retest reliability was very good (ICC = 0.78). Low-to-moderate associations were observed between the 8-item WLQ and the health/performance variables while weak associations were observed between the demographic variables. Conclusions The 8-item WLQ demonstrated sufficient reliability and validity among employees from a public university system. Results suggest the 8-item WLQ is a usable alternative for studies when the more comprehensive 25-item WLQ is not available.
PENGEMBANGAN TES BERPIKIR KRITIS DENGAN PENDEKATAN ITEM RESPONSE THEORY

Directory of Open Access Journals (Sweden)

Fajrianthi Fajrianthi

2016-06-01

Full Text Available Penelitian ini bertujuan untuk menghasilkan sebuah alat ukur (tes berpikir kritis yang valid dan reliabel untuk digunakan, baik dalam lingkup pendidikan maupun kerja di Indonesia. Tahapan penelitian dilakukan berdasarkan tahap pengembangan tes menurut Hambleton dan Jones (1993. Kisi-kisi dan pembuatan butir didasarkan pada konsep dalam tes Watson-Glaser Critical Thinking Appraisal (WGCTA. Pada WGCTA, berpikir kritis terdiri dari lima dimensi yaitu Inference, Recognition Assumption, Deduction, Interpretation dan Evaluation of arguments. Uji coba tes dilakukan pada 1.453 peserta tes seleksi karyawan di Surabaya, Gresik, Tuban, Bojonegoro, Rembang. Data dikotomi dianalisis dengan menggunakan model IRT dengan dua parameter yaitu daya beda dan tingkat kesulitan butir. Analisis dilakukan dengan menggunakan program statistik Mplus versi 6.11 Sebelum melakukan analisis dengan IRT, dilakukan pengujian asumsi yaitu uji unidimensionalitas, independensi lokal dan Item Characteristic Curve (ICC. Hasil analisis terhadap 68 butir menghasilkan 15 butir dengan daya beda yang cukup baik dan tingkat kesulitan butir yang berkisar antara –4 sampai dengan 2.448. Sedikitnya jumlah butir yang berkualitas baik disebabkan oleh kelemahan dalam menentukan subject matter experts di bidang berpikir kritis dan pemilihan metode skoring. Kata kunci: Pengembangan tes, berpikir kritis, item response theory DEVELOPING CRITICAL THINKING TEST UTILISING ITEM RESPONSE THEORY Abstract The present study was aimed to develop a valid and reliable instrument in assesing critical thinking which can be implemented both in educational and work settings in Indonesia. Following the Hambleton and Jones’s (1993 procedures on test development, the study developed the instrument by employing the concept of critical thinking from Watson-Glaser Critical Thinking Appraisal (WGCTA. The study included five dimensions of critical thinking as adopted from the WGCTA: Inference, Recognition
A Multidimensional Partial Credit Model with Associated Item and Test Statistics: An Application to Mixed-Format Tests

Science.gov (United States)

Yao, Lihua; Schwarz, Richard D.

2006-01-01

Multidimensional item response theory (IRT) models have been proposed for better understanding the dimensional structure of data or to define diagnostic profiles of student learning. A compensatory multidimensional two-parameter partial credit model (M-2PPC) for constructed-response items is presented that is a generalization of those proposed to…
Using Item Response Theory to Describe the Nonverbal Literacy Assessment (NVLA)

Science.gov (United States)

Fleming, Danielle; Wilson, Mark; Ahlgrim-Delzell, Lynn

2018-01-01

The Nonverbal Literacy Assessment (NVLA) is a literacy assessment designed for students with significant intellectual disabilities. The 218-item test was initially examined using confirmatory factor analysis. This method showed that the test worked as expected, but the items loaded onto a single factor. This article uses item response theory to…
Designing a Virtual Item Bank Based on the Techniques of Image Processing

Science.gov (United States)

Liao, Wen-Wei; Ho, Rong-Guey

2011-01-01

One of the major weaknesses of the item exposure rates of figural items in Intelligence Quotient (IQ) tests lies in its inaccuracies. In this study, a new approach is proposed and a useful test tool known as the Virtual Item Bank (VIB) is introduced. The VIB combine Automatic Item Generation theory and image processing theory with the concepts of…
Item Purification Does Not Always Improve DIF Detection: A Counterexample with Angoff's Delta Plot

Science.gov (United States)

Magis, David; Facon, Bruno

2013-01-01

Item purification is an iterative process that is often advocated as improving the identification of items affected by differential item functioning (DIF). With test-score-based DIF detection methods, item purification iteratively removes the items currently flagged as DIF from the test scores to get purified sets of items, unaffected by DIF. The…
Examining Construct Congruence for Psychometric Tests: A Note on an Extension to Binary Items and Nesting Effects

Science.gov (United States)

Raykov, Tenko; Marcoulides, George A.; Dimitrov, Dimiter M.; Li, Tatyana

2018-01-01

This article extends the procedure outlined in the article by Raykov, Marcoulides, and Tong for testing congruence of latent constructs to the setting of binary items and clustering effects. In this widely used setting in contemporary educational and psychological research, the method can be used to examine if two or more homogeneous…
Negative effects of item repetition on source memory

OpenAIRE

Kim, Kyungmi; Yi, Do-Joon; Raye, Carol L.; Johnson, Marcia K.

2012-01-01

In the present study, we explored how item repetition affects source memory for new item–feature associations (picture–location or picture–color). We presented line drawings varying numbers of times in Phase 1. In Phase 2, each drawing was presented once with a critical new feature. In Phase 3, we tested memory for the new source feature of each item from Phase 2. Experiments 1 and 2 demonstrated and replicated the negative effects of item repetition on incidental source memory. Prior item re...
Editorial Changes and Item Performance: Implications for Calibration and Pretesting

Directory of Open Access Journals (Sweden)

Heather Stoffel

2014-11-01

Full Text Available Previous research on the impact of text and formatting changes on test-item performance has produced mixed results. This matter is important because it is generally acknowledged that any change to an item requires that it be recalibrated. The present study investigated the effects of seven classes of stylistic changes on item difficulty, discrimination, and response time for a subset of 65 items that make up a standardized test for physician licensure completed by 31,918 examinees in 2012. One of two versions of each item (original or revised was randomly assigned to examinees such that each examinee saw only two experimental items, with each item being administered to approximately 480 examinees. The stylistic changes had little or no effect on item difficulty or discrimination; however, one class of edits -' changing an item from an open lead-in (incomplete statement to a closed lead-in (direct question -' did result in slightly longer response times. Data for nonnative speakers of English were analyzed separately with nearly identical results. These findings have implications for the conventional practice of repretesting (or recalibrating items that have been subjected to minor editorial changes.
Psychometric aspects of item mapping for criterion-referenced interpretation and bookmark standard setting.

Science.gov (United States)

Huynh, Huynh

2010-01-01

Locating an item on an achievement continuum (item mapping) is well-established in technical work for educational/psychological assessment. Applications of item mapping may be found in criterion-referenced (CR) testing (or scale anchoring, Beaton and Allen, 1992; Huynh, 1994, 1998a, 2000a, 2000b, 2006), computer-assisted testing, test form assembly, and in standard setting methods based on ordered test booklets. These methods include the bookmark standard setting originally used for the CTB/TerraNova tests (Lewis, Mitzel, Green, and Patz, 1999), the item descriptor process (Ferrara, Perie, and Johnson, 2002) and a similar process described by Wang (2003) for multiple-choice licensure and certification examinations. While item response theory (IRT) models such as the Rasch and two-parameter logistic (2PL) models traditionally place a binary item at its location, Huynh has argued in the cited papers that such mapping may not be appropriate in selecting items for CR interpretation and scale anchoring.
More is not Always Better: The Relation between Item Response and Item Response Time in Raven’s Matrices

Directory of Open Access Journals (Sweden)

Frank Goldhammer

2015-03-01

Full Text Available The role of response time in completing an item can have very different interpretations. Responding more slowly could be positively related to success as the item is answered more carefully. However, the association may be negative if working faster indicates higher ability. The objective of this study was to clarify the validity of each assumption for reasoning items considering the mode of processing. A total of 230 persons completed a computerized version of Raven’s Advanced Progressive Matrices test. Results revealed that response time overall had a negative effect. However, this effect was moderated by items and persons. For easy items and able persons the effect was strongly negative, for difficult items and less able persons it was less negative or even positive. The number of rules involved in a matrix problem proved to explain item difficulty significantly. Most importantly, a positive interaction effect between the number of rules and item response time indicated that the response time effect became less negative with an increasing number of rules. Moreover, exploratory analyses suggested that the error type influenced the response time effect.
Performance on large-scale science tests: Item attributes that may impact achievement scores

Science.gov (United States)

Gordon, Janet Victoria

, characteristics of test items themselves and/or opportunities to learn. Suggestions for future research are made.
Item and test analysis to identify quality multiple choice questions (MCQS from an assessment of medical students of Ahmedabad, Gujarat

Directory of Open Access Journals (Sweden)

Sanju Gajjar

2014-01-01

Full Text Available Background: Multiple choice questions (MCQs are frequently used to assess students in different educational streams for their objectivity and wide reach of coverage in less time. However, the MCQs to be used must be of quality which depends upon its difficulty index (DIF I, discrimination index (DI and distracter efficiency (DE. Objective: To evaluate MCQs or items and develop a pool of valid items by assessing with DIF I, DI and DE and also to revise/ store or discard items based on obtained results. Settings: Study was conducted in a medical school of Ahmedabad. Materials and Methods: An internal examination in Community Medicine was conducted after 40 hours teaching during 1 st MBBS which was attended by 148 out of 150 students. Total 50 MCQs or items and 150 distractors were analyzed. Statistical Analysis: Data was entered and analyzed in MS Excel 2007 and simple proportions, mean, standard deviations, coefficient of variation were calculated and unpaired t test was applied. Results: Out of 50 items, 24 had "good to excellent" DIF I (31 - 60% and 15 had "good to excellent" DI (> 0.25. Mean DE was 88.6% considered as ideal/ acceptable and non functional distractors (NFD were only 11.4%. Mean DI was 0.14. Poor DI (< 0.15 with negative DI in 10 items indicates poor preparedness of students and some issues with framing of at least some of the MCQs. Increased proportion of NFDs (incorrect alternatives selected by < 5% students in an item decrease DE and makes it easier. There were 15 items with 17 NFDs, while rest items did not have any NFD with mean DE of 100%. Conclusion: Study emphasizes the selection of quality MCQs which truly assess the knowledge and are able to differentiate the students of different abilities in correct manner.
The 12-item World Health Organization Disability Assessment Schedule II (WHO-DAS II: a nonparametric item response analysis

Directory of Open Access Journals (Sweden)

Fernandez Ana

2010-05-01

Full Text Available Abstract Background Previous studies have analyzed the psychometric properties of the World Health Organization Disability Assessment Schedule II (WHO-DAS II using classical omnibus measures of scale quality. These analyses are sample dependent and do not model item responses as a function of the underlying trait level. The main objective of this study was to examine the effectiveness of the WHO-DAS II items and their options in discriminating between changes in the underlying disability level by means of item response analyses. We also explored differential item functioning (DIF in men and women. Methods The participants were 3615 adult general practice patients from 17 regions of Spain, with a first diagnosed major depressive episode. The 12-item WHO-DAS II was administered by the general practitioners during the consultation. We used a non-parametric item response method (Kernel-Smoothing implemented with the TestGraf software to examine the effectiveness of each item (item characteristic curves and their options (option characteristic curves in discriminating between changes in the underliying disability level. We examined composite DIF to know whether women had a higher probability than men of endorsing each item. Results Item response analyses indicated that the twelve items forming the WHO-DAS II perform very well. All items were determined to provide good discrimination across varying standardized levels of the trait. The items also had option characteristic curves that showed good discrimination, given that each increasing option became more likely than the previous as a function of increasing trait level. No gender-related DIF was found on any of the items. Conclusions All WHO-DAS II items were very good at assessing overall disability. Our results supported the appropriateness of the weights assigned to response option categories and showed an absence of gender differences in item functioning.
Applicability of Item Response Theory to the Korean Nurses' Licensing Examination

Directory of Open Access Journals (Sweden)

Geum-Hee Jeong

2005-06-01

Full Text Available To test the applicability of item response theory (IRT to the Korean Nurses' Licensing Examination (KNLE, item analysis was performed after testing the unidimensionality and goodness-of-fit. The results were compared with those based on classical test theory. The results of the 330-item KNLE administered to 12,024 examinees in January 2004 were analyzed. Unidimensionality was tested using DETECT and the goodness-of-fit was tested using WINSTEPS for the Rasch model and Bilog-MG for the two-parameter logistic model. Item analysis and ability estimation were done using WINSTEPS. Using DETECT, Dmax ranged from 0.1 to 0.23 for each subject. The mean square value of the infit and outfit values of all items using WINSTEPS ranged from 0.1 to 1.5, except for one item in pediatric nursing, which scored 1.53. Of the 330 items, 218 (42.7% were misfit using the two-parameter logistic model of Bilog-MG. The correlation coefficients between the difficulty parameter using the Rasch model and the difficulty index from classical test theory ranged from 0.9039 to 0.9699. The correlation between the ability parameter using the Rasch model and the total score from classical test theory ranged from 0.9776 to 0.9984. Therefore, the results of the KNLE fit unidimensionality and goodness-of-fit for the Rasch model. The KNLE should be a good sample for analysis according to the IRT Rasch model, so further research using IRT is possible.
Combining item and bulk material loss-detection uncertainties

International Nuclear Information System (INIS)

Eggers, R.F.

1982-01-01

Loss detection requirements, such as five formula kilograms with 99% probability of detection, which apply to the sum of losses from material in both item and bulk form, constitute a special problem for the nuclear material statistician. Requirements of this type are included in the Material Control and Accounting Reform Amendments described in the Advance Notice of Proposed Rule Making (Federal Register, 46(175):45144-46151). Attribute test sampling of items is the method used to detect gross defects in the inventory of items in a given control unit. Attribute sampling plans are designed to detect a loss of a specificed goal quantity of material with a given probability. In contrast to the methods and statistical models used for item loss detection, bulk material loss detection requires all the material entering and leaving a control unit to be measured and the calculation of a loss estimator that will be tested against an appropriate alarm threshold. The alarm threshold is determined from an estimate of the error inherent in the components of the loss estimator. In this paper a simple grahical method of evaluating the combined capabilities of bulk material loss detection methods and item attribute testing procedures will be described. Quantitative results will be given for several cases, indicating how a decrease in the precision of the item loss detection method tends to force an increase in the precision of the bulk loss detection procedure in order to meet the overall detection requirement. 4 figures
A signal detection-item response theory model for evaluating neuropsychological measures.

Science.gov (United States)

Thomas, Michael L; Brown, Gregory G; Gur, Ruben C; Moore, Tyler M; Patt, Virginie M; Risbrough, Victoria B; Baker, Dewleen G

2018-02-05

Models from signal detection theory are commonly used to score neuropsychological test data, especially tests of recognition memory. Here we show that certain item response theory models can be formulated as signal detection theory models, thus linking two complementary but distinct methodologies. We then use the approach to evaluate the validity (construct representation) of commonly used research measures, demonstrate the impact of conditional error on neuropsychological outcomes, and evaluate measurement bias. Signal detection-item response theory (SD-IRT) models were fitted to recognition memory data for words, faces, and objects. The sample consisted of U.S. Infantry Marines and Navy Corpsmen participating in the Marine Resiliency Study. Data comprised item responses to the Penn Face Memory Test (PFMT; N = 1,338), Penn Word Memory Test (PWMT; N = 1,331), and Visual Object Learning Test (VOLT; N = 1,249), and self-report of past head injury with loss of consciousness. SD-IRT models adequately fitted recognition memory item data across all modalities. Error varied systematically with ability estimates, and distributions of residuals from the regression of memory discrimination onto self-report of past head injury were positively skewed towards regions of larger measurement error. Analyses of differential item functioning revealed little evidence of systematic bias by level of education. SD-IRT models benefit from the measurement rigor of item response theory-which permits the modeling of item difficulty and examinee ability-and from signal detection theory-which provides an interpretive framework encompassing the experimentally validated constructs of memory discrimination and response bias. We used this approach to validate the construct representation of commonly used research measures and to demonstrate how nonoptimized item parameters can lead to erroneous conclusions when interpreting neuropsychological test data. Future work might include the

An Analysis of Cross Racial Identity Scale Scores Using Classical Test Theory and Rasch Item Response Models

Science.gov (United States)

Sussman, Joshua; Beaujean, A. Alexander; Worrell, Frank C.; Watson, Stevie

2013-01-01

Item response models (IRMs) were used to analyze Cross Racial Identity Scale (CRIS) scores. Rasch analysis scores were compared with classical test theory (CTT) scores. The partial credit model demonstrated a high goodness of fit and correlations between Rasch and CTT scores ranged from 0.91 to 0.99. CRIS scores are supported by both methods.…
Evolution of BCR/ABL gene mutation in CML is time dependent and dependent on the pressure exerted by tyrosine kinase inhibitor.

Directory of Open Access Journals (Sweden)

Shantashri Vaidya

Full Text Available BACKGROUND: Mutations in the ABL kinase domain and SH3-SH2 domain of the BCR/ABL gene and amplification of the Philadelphia chromosome are the two important BCR/ABL dependent mechanisms of imatinib resistance. Here, we intended to study the role played by TKI, imatinib, in selection of gene mutations and development of chromosomal abnormalities in Indian CML patients. METHODS: Direct sequencing methodology was employed to detect mutations and conventional cytogenetics was done to identify Philadelphia duplication. RESULTS: Among the different mechanisms of imatinib resistance, kinase domain mutations (39% of the BCR/ABL gene were seen to be more prevalent, followed by mutations in the SH3-SH2 domain (4% and then BCR/ABL amplification with the least frequency (1%. The median duration of occurrence of mutation was significantly shorter for patients with front line imatinib than those pre-treated with hydroxyurea. Patients with high Sokal score (p = 0.003 showed significantly higher incidence of mutations, as compared to patients with low/intermediate score. Impact of mutations on the clinical outcome in AP and BC was observed to be insignificant. Of the 94 imatinib resistant patients, only 1 patient exhibited duplication of Philadelphia chromosome, suggesting a less frequent occurrence of this abnormality in Indian CML patients. CONCLUSION: Close monitoring at regular intervals and proper analysis of the disease resistance would facilitate early detection of resistance and thus aid in the selection of the most appropriate therapy.
Reliability and validity of the Spanish version of the 10-item Connor-Davidson Resilience Scale (10-item CD-RISC in young adults

Directory of Open Access Journals (Sweden)

García-Campayo Javier

2011-08-01

Full Text Available Abstract Background The 10-item Connor-Davidson Resilience Scale (10-item CD-RISC is an instrument for measuring resilience that has shown good psychometric properties in its original version in English. The aim of this study was to evaluate the validity and reliability of the Spanish version of the 10-item CD-RISC in young adults and to verify whether it is structured in a single dimension as in the original English version. Findings Cross-sectional observational study including 681 university students ranging in age from 18 to 30 years. The number of latent factors in the 10 items of the scale was analyzed by exploratory factor analysis. Confirmatory factor analysis was used to verify whether a single factor underlies the 10 items of the scale as in the original version in English. The convergent validity was analyzed by testing whether the mean of the scores of the mental component of SF-12 (MCS and the quality of sleep as measured with the Pittsburgh Sleep Index (PSQI were higher in subjects with better levels of resilience. The internal consistency of the 10-item CD-RISC was estimated using the Cronbach α test and test-retest reliability was estimated with the intraclass correlation coefficient. The Cronbach α coefficient was 0.85 and the test-retest intraclass correlation coefficient was 0.71. The mean MCS score and the level of quality of sleep in both men and women were significantly worse in subjects with lower resilience scores. Conclusions The Spanish version of the 10-item CD-RISC showed good psychometric properties in young adults and thus can be used as a reliable and valid instrument for measuring resilience. Our study confirmed that a single factor underlies the resilience construct, as was the case of the original scale in English.
Negative effects of item repetition on source memory.

Science.gov (United States)

Kim, Kyungmi; Yi, Do-Joon; Raye, Carol L; Johnson, Marcia K

2012-08-01

In the present study, we explored how item repetition affects source memory for new item-feature associations (picture-location or picture-color). We presented line drawings varying numbers of times in Phase 1. In Phase 2, each drawing was presented once with a critical new feature. In Phase 3, we tested memory for the new source feature of each item from Phase 2. Experiments 1 and 2 demonstrated and replicated the negative effects of item repetition on incidental source memory. Prior item repetition also had a negative effect on source memory when different source dimensions were used in Phases 1 and 2 (Experiment 3) and when participants were explicitly instructed to learn source information in Phase 2 (Experiments 4 and 5). Importantly, when the order between Phases 1 and 2 was reversed, such that item repetition occurred after the encoding of critical item-source combinations, item repetition no longer affected source memory (Experiment 6). Overall, our findings did not support predictions based on item predifferentiation, within-dimension source interference, or general interference from multiple traces of an item. Rather, the findings were consistent with the idea that prior item repetition reduces attention to subsequent presentations of the item, decreasing the likelihood that critical item-source associations will be encoded.
Adaptation and validation into Portuguese language of the six-item cognitive impairment test (6CIT).

Science.gov (United States)

Apóstolo, João Luís Alves; Paiva, Diana Dos Santos; Silva, Rosa Carla Gomes da; Santos, Eduardo José Ferreira Dos; Schultz, Timothy John

2017-07-25

The six-item cognitive impairment test (6CIT) is a brief cognitive screening tool that can be administered to older people in 2-3 min. To adapt the 6CIT for the European Portuguese and determine its psychometric properties based on a sample recruited from several contexts (nursing homes; universities for older people; day centres; primary health care units). The original 6CIT was translated into Portuguese and the draft Portuguese version (6CIT-P) was back-translated and piloted. The accuracy of the 6CIT-P was assessed by comparison with the Portuguese Mini-Mental State Examination (MMSE). A convenience sample of 550 older people from various geographical locations in the north and centre of the country was used. The test-retest reliability coefficient was high (r = 0.95). The 6CIT-P also showed good internal consistency (α = 0.88) and corrected item-total correlations ranged between 0.32 and 0.90. Total 6CIT-P and MMSE scores were strongly correlated. The proposed 6CIT-P threshold for cognitive impairment is ≥10 in the Portuguese population, which gives sensitivity of 82.78% and specificity of 84.84%. The accuracy of 6CIT-P, as measured by area under the ROC curve, was 0.91. The 6CIT-P has high reliability and validity and is accurate when used to screen for cognitive impairment.
Development of a Postacute Hospital Item Bank for the New Pediatric Evaluation of Disability Inventory-Computer Adaptive Test

Science.gov (United States)

Dumas, Helene M.

2010-01-01

The PEDI-CAT is a new computer adaptive test (CAT) version of the Pediatric Evaluation of Disability Inventory (PEDI). Additional PEDI-CAT items specific to postacute pediatric hospital care were recently developed using expert reviews and cognitive interviewing techniques. Expert reviews established face and construct validity, providing positive…
Better assessment of physical function: item improvement is neglected but essential.

Science.gov (United States)

Bruce, Bonnie; Fries, James F; Ambrosini, Debbie; Lingala, Bharathi; Gandek, Barbara; Rose, Matthias; Ware, John E

2009-01-01

Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models
Problems with the factor analysis of items: Solutions based on item response theory and item parcelling

Directory of Open Access Journals (Sweden)

Gideon P. De Bruin

2004-10-01

Full Text Available The factor analysis of items often produces spurious results in the sense that unidimensional scales appear multidimensional. This may be ascribed to failure in meeting the assumptions of linearity and normality on which factor analysis is based. Item response theory is explicitly designed for the modelling of the non-linear relations between ordinal variables and provides a strong alternative to the factor analysis of items. Items may also be combined in parcels that are more likely to satisfy the assumptions of factor analysis than do the items. The use of the Rasch rating scale model and the factor analysis of parcels is illustrated with data obtained with the Locus of Control Inventory. The results of these analyses are compared with the results obtained through the factor analysis of items. It is shown that the Rasch rating scale model and the factoring of parcels produce superior results to the factor analysis of items. Recommendations for the analysis of scales are made. Opsomming Die faktorontleding van items lewer dikwels misleidende resultate op, veral in die opsig dat eendimensionele skale as meerdimensioneel voorkom. Hierdie resultate kan dikwels daaraan toegeskryf word dat daar nie aan die aannames van lineariteit en normaliteit waarop faktorontleding berus, voldoen word nie. Itemresponsteorie, wat eksplisiet vir die modellering van die nie-liniêre verbande tussen ordinale items ontwerp is, bied ’n aantreklike alternatief vir die faktorontleding van items. Items kan ook in pakkies gegroepeer word wat meer waarskynlik aan die aannames van faktorontleding voldoen as individuele items. Die gebruik van die Rasch beoordelingskaalmodel en die faktorontleding van pakkies word aan die hand van data wat met die Lokus van Beheervraelys verkry is, gedemonstreer. Die resultate van hierdie ontledings word vergelyk met die resultate wat deur ‘n faktorontleding van die individuele items verkry is. Die resultate dui daarop dat die Rasch
Test-retest reliability at the item level and total score level of the Norwegian version of the Spinal Cord Injury Falls Concern Scale (SCI-FCS).

Science.gov (United States)

Roaldsen, Kirsti Skavberg; Måøy, Åsa Blad; Jørgensen, Vivien; Stanghelle, Johan Kvalvik

2016-05-01

Translation of the Spinal Cord Injury Falls Concern Scale (SCI-FCS), and investigation of test-retest reliability on item-level and total-score-level. Translation, adaptation and test-retest study. A specialized rehabilitation setting in Norway. Fifty-four wheelchair users with a spinal cord injury. The median age of the cohort was 49 years, and the median number of years after injury was 13. Interventions/measurements: The SCI-FCS was translated and back-translated according to guidelines. Individuals answered the SCI-FCS twice over the course of one week. We investigated item-level test-retest reliability using Svensson's rank-based statistical method for disagreement analysis of paired ordinal data. For relative reliability, we analyzed the total-score-level test-retest reliability with intraclass correlation coefficients (ICC2.1), the standard error of measurement (SEM), and the smallest detectable change (SDC) for absolute reliability/measurement-error assessment and Cronbach's alpha for internal consistency. All items showed satisfactory percentage agreement (≥69%) between test and retest. There were small but non-negligible systematic disagreements among three items; we recovered an 11-13% higher chance for a lower second score. There was no disagreement due to random variance. The test-retest agreement (ICC2.1) was excellent (0.83). The SEM was 2.6 (12%), and the SDC was 7.1 (32%). The Cronbach's alpha was high (0.88). The Norwegian SCI-FCS is highly reliable for wheelchair users with chronic spinal cord injuries.
Creating a Database for Test Items in National Examinations (pp ...

African Journals Online (AJOL)

Nekky Umera

different programmers create files and application programs over a long period. .... In theory or essay questions, alternative methods of solving problems are explored and ... Unworthy items are those that do not focus on the central concept or.
Item information and discrimination functions for trinary PCM items

NARCIS (Netherlands)

Akkermans, Wies; Muraki, Eiji

1997-01-01

For trinary partial credit items the shape of the item information and the item discrimination function is examined in relation to the item parameters. In particular, it is shown that these functions are unimodal if δ2 – δ1 < 4 ln 2 and bimodal otherwise. The locations and values of the maxima are
Algorithms for the Construction of Parallel Tests by Zero-One Programming. Project Psychometric Aspects of Item Banking No. 7. Research Report 86-7.

Science.gov (United States)

Boekkooi-Timminga, Ellen

Nine methods for automated test construction are described. All are based on the concepts of information from item response theory. Two general kinds of methods for the construction of parallel tests are presented: (1) sequential test design; and (2) simultaneous test design. Sequential design implies that the tests are constructed one after the…
Determination of radionuclides in environmental test items at CPHR: traceability and uncertainty calculation.

Science.gov (United States)

Carrazana González, J; Fernández, I M; Capote Ferrera, E; Rodríguez Castro, G

2008-11-01

Information about how the laboratory of Centro de Protección e Higiene de las Radiaciones (CPHR), Cuba establishes its traceability to the International System of Units for the measurement of radionuclides in environmental test items is presented. A comparison among different methodologies of uncertainty calculation, including an analysis of the feasibility of using the Kragten-spreadsheet approach, is shown. In the specific case of the gamma spectrometric assay, the influence of each parameter, and the identification of the major contributor, in the relative difference between the methods of uncertainty calculation (Kragten and partial derivative) is described. The reliability of the uncertainty calculation results reported by the commercial software Gamma 2000 from Silena is analyzed.
Determination of radionuclides in environmental test items at CPHR: Traceability and uncertainty calculation

International Nuclear Information System (INIS)

Carrazana Gonzalez, J.; Fernandez, I.M.; Capote Ferrera, E.; Rodriguez Castro, G.

2008-01-01

Information about how the laboratory of Centro de Proteccion e Higiene de las Radiaciones (CPHR), Cuba establishes its traceability to the International System of Units for the measurement of radionuclides in environmental test items is presented. A comparison among different methodologies of uncertainty calculation, including an analysis of the feasibility of using the Kragten-spreadsheet approach, is shown. In the specific case of the gamma spectrometric assay, the influence of each parameter, and the identification of the major contributor, in the relative difference between the methods of uncertainty calculation (Kragten and partial derivative) is described. The reliability of the uncertainty calculation results reported by the commercial software Gamma 2000 from Silena is analyzed
Analysis of differential item functioning in the depression item bank from the Patient Reported Outcome Measurement Information System (PROMIS: An item response theory approach

Directory of Open Access Journals (Sweden)

JOSEPH P. EIMICKE

2009-06-01

Full Text Available The aims of this paper are to present findings related to differential item functioning (DIF in the Patient Reported Outcome Measurement Information System (PROMIS depression item bank, and to discuss potential threats to the validity of results from studies of DIF. The 32 depression items studied were modified from several widely used instruments. DIF analyses of gender, age and education were performed using a sample of 735 individuals recruited by a survey polling firm. DIF hypotheses were generated by asking content experts to indicate whether or not they expected DIF to be present, and the direction of the DIF with respect to the studied comparison groups. Primary analyses were conducted using the graded item response model (for polytomous, ordered response category data with likelihood ratio tests of DIF, accompanied by magnitude measures. Sensitivity analyses were performed using other item response models and approaches to DIF detection. Despite some caveats, the items that are recommended for exclusion or for separate calibration were "I felt like crying" and "I had trouble enjoying things that I used to enjoy." The item, "I felt I had no energy," was also flagged as evidencing DIF, and recommended for additional review. On the one hand, false DIF detection (Type 1 error was controlled to the extent possible by ensuring model fit and purification. On the other hand, power for DIF detection might have been compromised by several factors, including sparse data and small sample sizes. Nonetheless, practical and not just statistical significance should be considered. In this case the overall magnitude and impact of DIF was small for the groups studied, although impact was relatively large for some individuals.
Item response theory scoring and the detection of curvilinear relationships.

Science.gov (United States)

Carter, Nathan T; Dalal, Dev K; Guan, Li; LoPilato, Alexander C; Withrow, Scott A

2017-03-01

Psychologists are increasingly positing theories of behavior that suggest psychological constructs are curvilinearly related to outcomes. However, results from empirical tests for such curvilinear relations have been mixed. We propose that correctly identifying the response process underlying responses to measures is important for the accuracy of these tests. Indeed, past research has indicated that item responses to many self-report measures follow an ideal point response process-wherein respondents agree only to items that reflect their own standing on the measured variable-as opposed to a dominance process, wherein stronger agreement, regardless of item content, is always indicative of higher standing on the construct. We test whether item response theory (IRT) scoring appropriate for the underlying response process to self-report measures results in more accurate tests for curvilinearity. In 2 simulation studies, we show that, regardless of the underlying response process used to generate the data, using the traditional sum-score generally results in high Type 1 error rates or low power for detecting curvilinearity, depending on the distribution of item locations. With few exceptions, appropriate power and Type 1 error rates are achieved when dominance-based and ideal point-based IRT scoring are correctly used to score dominance and ideal point response data, respectively. We conclude that (a) researchers should be theory-guided when hypothesizing and testing for curvilinear relations; (b) correctly identifying whether responses follow an ideal point versus dominance process, particularly when items are not extreme is critical; and (c) IRT model-based scoring is crucial for accurate tests of curvilinearity. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Can Item Keyword Feedback Help Remediate Knowledge Gaps?

Science.gov (United States)

Feinberg, Richard A; Clauser, Amanda L

2016-10-01

In graduate medical education, assessment results can effectively guide professional development when both assessment and feedback support a formative model. When individuals cannot directly access the test questions and responses, a way of using assessment results formatively is to provide item keyword feedback. The purpose of the following study was to investigate whether exposure to item keyword feedback aids in learner remediation. Participants included 319 trainees who completed a medical subspecialty in-training examination (ITE) in 2012 as first-year fellows, and then 1 year later in 2013 as second-year fellows. Performance on 2013 ITE items in which keywords were, or were not, exposed as part of the 2012 ITE score feedback was compared across groups based on the amount of time studying (preparation). For the same items common to both 2012 and 2013 ITEs, response patterns were analyzed to investigate changes in answer selection. Test takers who indicated greater amounts of preparation on the 2013 ITE did not perform better on the items in which keywords were exposed compared to those who were not exposed. The response pattern analysis substantiated overall growth in performance from the 2012 ITE. For items with incorrect responses on both attempts, examinees selected the same option 58% of the time. Results from the current study were unsuccessful in supporting the use of item keywords in aiding remediation. Unfortunately, the results did provide evidence of examinees retaining misinformation.
A Comparison of Item Selection Procedures Using Different Ability Estimation Methods in Computerized Adaptive Testing Based on the Generalized Partial Credit Model

Science.gov (United States)

Ho, Tsung-Han

2010-01-01

Computerized adaptive testing (CAT) provides a highly efficient alternative to the paper-and-pencil test. By selecting items that match examinees' ability levels, CAT not only can shorten test length and administration time but it can also increase measurement precision and reduce measurement error. In CAT, maximum information (MI) is the most…
Changes in Word Usage Frequency May Hamper Intergenerational Comparisons of Vocabulary Skills: An Ngram Analysis of Wordsum, WAIS, and WISC Test Items

Science.gov (United States)

Roivainen, Eka

2014-01-01

Research on secular trends in mean intelligence test scores shows smaller gains in vocabulary skills than in nonverbal reasoning. One possible explanation is that vocabulary test items become outdated faster compared to nonverbal tasks. The history of the usage frequency of the words on five popular vocabulary tests, the GSS Wordsum, Wechsler…
Diagnostic accuracy of a two-item Drug Abuse Screening Test (DAST-2).

Science.gov (United States)

Tiet, Quyen Q; Leyva, Yani E; Moos, Rudolf H; Smith, Brandy

2017-11-01

Drug use is prevalent and costly to society, but individuals with drug use disorders (DUDs) are under-diagnosed and under-treated, particularly in primary care (PC) settings. Drug screening instruments have been developed to identify patients with DUDs and facilitate treatment. The Drug Abuse Screening Test (DAST) is one of the most well-known drug screening instruments. However, similar to many such instruments, it is too long for routine use in busy PC settings. This study developed and validated a briefer and more practical DAST for busy PC settings. We recruited 1300 PC patients in two Department of Veterans Affairs (VA) clinics. Participants responded to a structured diagnostic interview. We randomly selected half of the sample to develop and the other half to validate the new instrument. We employed signal detection techniques to select the best DAST items to identify DUDs (based on the MINI) and negative consequences of drug use (measured by the Inventory of Drug Use Consequences). Performance indicators were calculated. The two-item DAST (DAST-2) was 97% sensitive and 91% specific for DUDs in the development sample and 95% sensitive and 89% specific in the validation sample. It was highly sensitive and specific for DUD and negative consequences of drug use in subgroups of patients, including gender, age, race/ethnicity, marital status, educational level, and posttraumatic stress disorder status. The DAST-2 is an appropriate drug screening instrument for routine use in PC settings in the VA and may be applicable in broader range of PC clinics. Published by Elsevier Ltd.

Few items in the thyroid-related quality of life instrument ThyPRO exhibited differential item functioning.

Science.gov (United States)

Watt, Torquil; Groenvold, Mogens; Hegedüs, Laszlo; Bonnema, Steen Joop; Rasmussen, Åse Krogh; Feldt-Rasmussen, Ulla; Bjorner, Jakob Bue

2014-02-01

To evaluate the extent of differential item functioning (DIF) within the thyroid-specific quality of life patient-reported outcome measure, ThyPRO, according to sex, age, education and thyroid diagnosis. A total of 838 patients with benign thyroid diseases completed the ThyPRO questionnaire (84 five-point items, 13 scales). Uniform and nonuniform DIF were investigated using ordinal logistic regression, testing for both statistical significance and magnitude (∆R(2) > 0.02). Scale level was estimated by the sum score, after purification. Twenty instances of DIF in 17 of the 84 items were found. Eight according to diagnosis, where the goiter scale was the one most affected, possibly due to differing perceptions in patients with auto-immune thyroid diseases compared to patients with simple goiter. Eight DIFs according to age were found, of which 5 were in positively worded items, which younger patients were more likely to endorse; one according to gender: women were more likely to report crying, and three according to educational level. The vast majority of DIF had only minor influence on the scale scores (0.1-2.3 points on the 0-100 scales), but two DIF corresponded to a difference of 4.6 and 9.8, respectively. Ordinal logistic regression identified DIF in 17 of 84 items. The potential impact of this on the present scales was low, but items displaying DIF could be avoided when developing abbreviated scales, where the potential impact of DIF (due to fewer items) will be larger.
Development of an item bank and computer adaptive test for role functioning

DEFF Research Database (Denmark)

Anatchkova, Milena D; Rose, Matthias; Ware, John E

2012-01-01

Role functioning (RF) is a key component of health and well-being and an important outcome in health research. The aim of this study was to develop an item bank to measure impact of health on role functioning.......Role functioning (RF) is a key component of health and well-being and an important outcome in health research. The aim of this study was to develop an item bank to measure impact of health on role functioning....
Quantification of Nε-(2-Furoylmethyl)-L-lysine (furosine), Nε-(Carboxymethyl)-L-lysine (CML), Nε-(Carboxyethyl)-L-lysine (CEL) and total lysine through stable isotope dilution assay and tandem mass spectrometry

NARCIS (Netherlands)

Troise, A.D.; Fiore, A.; Wiltafsky, M.; Fogliano, V.

2015-01-01

The control of Maillard reaction (MR) is a key point to ensure processed foods quality. Due to the presence of a primary amino group on its side chain, lysine is particularly prone to chemical modifications with the formation of Amadori products (AP), Nε-(Carboxymethyl)-L-lysine (CML),
Robust Scale Transformation Methods in IRT True Score Equating under Common-Item Nonequivalent Groups Design

Science.gov (United States)

He, Yong

2013-01-01

Common test items play an important role in equating multiple test forms under the common-item nonequivalent groups design. Inconsistent item parameter estimates among common items can lead to large bias in equated scores for IRT true score equating. Current methods extensively focus on detection and elimination of outlying common items, which…
Quantification of Nε-(2-Furoylmethyl)-L-lysine (furosine), Nε-(Carboxymethyl)-L-lysine (CML), Nε-(Carboxyethyl)-L-lysine (CEL) and total lysine through stable isotope dilution assay and tandem mass spectrometry.

Science.gov (United States)

Troise, Antonio Dario; Fiore, Alberto; Wiltafsky, Markus; Fogliano, Vincenzo

2015-12-01

The control of Maillard reaction (MR) is a key point to ensure processed foods quality. Due to the presence of a primary amino group on its side chain, lysine is particularly prone to chemical modifications with the formation of Amadori products (AP), Nε-(Carboxymethyl)-L-lysine (CML), Nε-(Carboxyethyl)-L-lysine (CEL). A new analytical strategy was proposed which allowed to simultaneously quantify lysine, CML, CEL and the Nε-(2-Furoylmethyl)-L-lysine (furosine), the indirect marker of AP. The procedure is based on stable isotope dilution assay followed by liquid chromatography tandem mass spectrometry. It showed high sensitivity and good reproducibility and repeatability in different foods. The limit of detection and the RSD% were lower than 5 ppb and below 8%, respectively. Results obtained with the new procedure not only improved the knowledge about the reliability of thermal treatment markers, but also defined new insights in the relationship between Maillard reaction products and their precursors. Copyright © 2015 Elsevier Ltd. All rights reserved.
Diagnostic radiography as a risk factor for chronic myeloid and monocytic leukaemia (CML)

International Nuclear Information System (INIS)

Preston-Martin, S.; Thomas, D.C.; Yu, M.C.; Henderson, B.E.

1989-01-01

The study included 136 Los Angeles County residents aged 20-69 with chronic myeloid and monocytic leukemia CML diagnosed from 1979 to 1985 (cases) and 136 neighbourhood controls. During 3-20 years before diagnosis more cases than controls had radiographic examinations of back, gastrointestinal (GI) tract and kidneys, and cases more often had GI and back radiography on multiple occasions (odds ratio (OR) for back X-rays on five or more occasions = 12.0; P < 0.01). Published estimates were used to assign a minimum dose to active bone marrow for various radiographic procedures. ORs were estimated for cumulative marrow doses for each of four time periods (3-5 years, 6-10 years, 11-20 years and 3-20 years before diagnosis). ORs for exposure to 0.99, 100-999, 1000-1999 and ≥ 2000 mrad in the 3-20 years before diagnosis were 1.0, 1.4, 1.6 and 2.4 (P for highest exposure category and P for trend both < 0.05). The association was strongest for 6-10 years before diagnosis, and effects of radiation exposure during this period remained significant after consideration of other risk factors in a logistic regression analysis. (author)
Do people with and without medical conditions respond similarly to the short health anxiety inventory? An assessment of differential item functioning using item response theory.

Science.gov (United States)

LeBouthillier, Daniel M; Thibodeau, Michel A; Alberts, Nicole M; Hadjistavropoulos, Heather D; Asmundson, Gordon J G

2015-04-01

Individuals with medical conditions are likely to have elevated health anxiety; however, research has not demonstrated how medical status impacts response patterns on health anxiety measures. Measurement bias can undermine the validity of a questionnaire by overestimating or underestimating scores in groups of individuals. We investigated whether the Short Health Anxiety Inventory (SHAI), a widely-used measure of health anxiety, exhibits medical condition-based bias on item and subscale levels, and whether the SHAI subscales adequately assess the health anxiety continuum. Data were from 963 individuals with diabetes, breast cancer, or multiple sclerosis, and 372 healthy individuals. Mantel-Haenszel tests and item characteristic curves were used to classify the severity of item-level differential item functioning in all three medical groups compared to the healthy group. Test characteristic curves were used to assess scale-level differential item functioning and whether the SHAI subscales adequately assess the health anxiety continuum. Nine out of 14 items exhibited differential item functioning. Two items exhibited differential item functioning in all medical groups compared to the healthy group. In both Thought Intrusion and Fear of Illness subscales, differential item functioning was associated with mildly deflated scores in medical groups with very high levels of the latent traits. Fear of Illness items poorly discriminated between individuals with low and very low levels of the latent trait. While individuals with medical conditions may respond differentially to some items, clinicians and researchers can confidently use the SHAI with a variety of medical populations without concern of significant bias. Copyright © 2015 Elsevier Inc. All rights reserved.
Validation of the Spanish versions of the long (26 items) and short (12 items) forms of the Self-Compassion Scale (SCS).

Science.gov (United States)

Garcia-Campayo, Javier; Navarro-Gil, Mayte; Andrés, Eva; Montero-Marin, Jesús; López-Artal, Lorena; Demarzo, Marcelo Marcos Piva

2014-01-10

Self-compassion is a key psychological construct for assessing clinical outcomes in mindfulness-based interventions. The aim of this study was to validate the Spanish versions of the long (26 item) and short (12 item) forms of the Self-Compassion Scale (SCS). The translated Spanish versions of both subscales were administered to two independent samples: Sample 1 was comprised of university students (n = 268) who were recruited to validate the long form, and Sample 2 was comprised of Aragon Health Service workers (n = 271) who were recruited to validate the short form. In addition to SCS, the Mindful Attention Awareness Scale (MAAS), the State-Trait Anxiety Inventory-Trait (STAI-T), the Beck Depression Inventory (BDI) and the Perceived Stress Questionnaire (PSQ) were administered. Construct validity, internal consistency, test-retest reliability and convergent validity were tested. The Confirmatory Factor Analysis (CFA) of the long and short forms of the SCS confirmed the original six-factor model in both scales, showing goodness of fit. Cronbach's α for the 26 item SCS was 0.87 (95% CI = 0.85-0.90) and ranged between 0.72 and 0.79 for the 6 subscales. Cronbach's α for the 12-item SCS was 0.85 (95% CI = 0.81-0.88) and ranged between 0.71 and 0.77 for the 6 subscales. The long (26-item) form of the SCS showed a test-retest coefficient of 0.92 (95% CI = 0.89-0.94). The Intraclass Correlation (ICC) for the 6 subscales ranged from 0.84 to 0.93. The short (12-item) form of the SCS showed a test-retest coefficient of 0.89 (95% CI: 0.87-0.93). The ICC for the 6 subscales ranged from 0.79 to 0.91. The long and short forms of the SCS exhibited a significant negative correlation with the BDI, the STAI and the PSQ, and a significant positive correlation with the MAAS. The correlation between the total score of the long and short SCS form was r = 0.92. The Spanish versions of the long (26-item) and short (12-item) forms of the SCS are valid and
‘Forget me (not?’ – Remembering forget-items versus un-cued items in directed forgetting

Directory of Open Access Journals (Sweden)

Bastian eZwissler

2015-11-01

Full Text Available Humans need to be able to selectively control their memories. Here, we investigate the underlying processes in item-method directed forgetting and compare the classic active memory cues in this paradigm with a passive instruction. Typically, individual items are presented and each is followed by either a forget- or remember-instruction. On a surprise test of all items, memory is then worse for to-be-forgotten items (TBF compared to to-be-remembered items (TBR. This is thought to result from selective rehearsal of TBR, or from active inhibition of TBF, or from both. However, evidence suggests that if a forget instruction initiates active processing, paradoxical effects may also arise. To investigate the underlying mechanisms, four experiments were conducted where un-cued items (UI were introduced and recognition performance was compared between TBR, TBF and UI stimuli. Accuracy was encouraged via a performance-dependent monetary bonus. Across all experiments, including perceptually fully matched variants, memory accuracy for TBF was reduced compared to TBR, but better than for UI. Moreover, participants used a more conservative response criterion when responding to TBF stimuli. Thus, ironically, the F cue results in active processing, but this does not have inhibitory effects that would impair recognition memory beyond a un-cued baseline condition. This casts doubts on inhibitory accounts of item-method directed forgetting and is also difficult to reconcile with pure selective rehearsal of TBR. While the F-cue does induce active processing, this does not result in particularly successful forgetting. The pattern seems most consistent with the notion of ironic processing.
The effect of heightened awareness of observation on consumption of a multi-item laboratory test meal in females.

Science.gov (United States)

Robinson, Eric; Proctor, Michael; Oldham, Melissa; Masic, Una

2016-09-01

Human eating behaviour is often studied in the laboratory, but whether the extent to which a participant believes that their food intake is being measured influences consumption of different meal items is unclear. Our main objective was to examine whether heightened awareness of observation of food intake affects consumption of different food items during a lunchtime meal. One hundred and fourteen female participants were randomly assigned to an experimental condition designed to heighten participant awareness of observation or a condition in which awareness of observation was lower, before consuming an ad libitum multi-item lunchtime meal in a single session study. Under conditions of heightened awareness, participants tended to eat less of an energy dense snack food (cookies) in comparison to the less aware condition. Consumption of other meal items and total energy intake were similar in the heightened awareness vs. less aware condition. Exploratory secondary analyses suggested that the effect heightened awareness had on reduced cookie consumption was dependent on weight status, as well as trait measures of dietary restraint and disinhibition, whereby only participants with overweight/obesity, high disinhibition or low restraint reduced their cookie consumption. Heightened awareness of observation may cause females to reduce their consumption of an energy dense snack food during a test meal in the laboratory and this effect may be moderated by participant individual differences. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Integrating Test-Form Formatting into Automated Test Assembly

Science.gov (United States)

Diao, Qi; van der Linden, Wim J.

2013-01-01

Automated test assembly uses the methodology of mixed integer programming to select an optimal set of items from an item bank. Automated test-form generation uses the same methodology to optimally order the items and format the test form. From an optimization point of view, production of fully formatted test forms directly from the item pool using…
Evaluating construct validity of the second version of the Copenhagen Psychosocial Questionnaire through analysis of differential item functioning and differential item effect

DEFF Research Database (Denmark)

Bjorner, Jakob Bue; Pejtersen, Jan Hyld

2010-01-01

AIMS: To evaluate the construct validity of the Copenhagen Psychosocial Questionnaire II (COPSOQ II) by means of tests for differential item functioning (DIF) and differential item effect (DIE). METHODS: We used a Danish general population postal survey (n = 4,732 with 3,517 wage earners) with a ...
Using Automated Processes to Generate Test Items And Their Associated Solutions and Rationales to Support Formative Feedback

Directory of Open Access Journals (Sweden)

Mark Gierl

2015-08-01

Full Text Available Automatic item generation is the process of using item models to produce assessment tasks using computer technology. An item model is similar to a template that highlights the elements in the task that must be manipulated to produce new items. The purpose of our study is to describe an innovative method for generating large numbers of diverse and heterogeneous items along with their solutions and associated rationales to support formative feedback. We demonstrate the method by generating items in two diverse content areas, mathematics and nonverbal reasoning
Exposure Control Using Adaptive Multi-Stage Item Bundles.

Science.gov (United States)

Luecht, Richard M.

This paper presents a multistage adaptive testing test development paradigm that promises to handle content balancing and other test development needs, psychometric reliability concerns, and item exposure. The bundled multistage adaptive testing (BMAT) framework is a modification of the computer-adaptive sequential testing framework introduced by…
The basics of item response theory using R

CERN Document Server

Baker, Frank B

2017-01-01

This graduate-level textbook is a tutorial for item response theory that covers both the basics of item response theory and the use of R for preparing graphical presentation in writings about the theory. Item response theory has become one of the most powerful tools used in test construction, yet one of the barriers to learning and applying it is the considerable amount of sophisticated computational effort required to illustrate even the simplest concepts. This text provides the reader access to the basic concepts of item response theory freed of the tedious underlying calculations. It is intended for those who possess limited knowledge of educational measurement and psychometrics. Rather than presenting the full scope of item response theory, this textbook is concise and practical and presents basic concepts without becoming enmeshed in underlying mathematical and computational complexities. Clearly written text and succinct R code allow anyone familiar with statistical concepts to explore and apply item re...
Three-item Direct Observation Screen (TIDOS) for autism spectrum disorder.

Science.gov (United States)

Oner, Pinar; Oner, Ozgur; Munir, Kerim

2014-08-01

We compared ratings on the Three-Item Direct Observation Screen test for autism spectrum disorders completed by pediatric residents with the Social Communication Questionnaire parent reports as an augmentative tool for improving autism spectrum disorder screening performance. We examined three groups of children (18-60 months) comparable in age (18-24 month, 24-36 month, 36-60 preschool subgroups) and gender distribution: n = 86 with Diagnostic and Statistical Manual of Mental Disorders (4th ed., text rev.) autism spectrum disorders; n = 76 with developmental delay without autism spectrum disorders; and n = 97 with typical development. The Three-Item Direct Observation Screen test included the following (a) Joint Attention, (b) Eye Contact, and (c) Responsiveness to Name. The parent Social Communication Questionnaire ratings had a sensitivity of .73 and specificity of .70 for diagnosis of autism spectrum disorders. The Three-Item Direct Observation Screen test item Joint Attention had a sensitivity of .82 and specificity of .90, Eye Contact had a sensitivity of .89 and specificity of .91, and Responsiveness to Name had a sensitivity of .67 and specificity of .87. In the Three-Item Direct Observation Screen test, having at least one of the three items positive had a sensitivity of .95 and specificity of .85. Age, diagnosis of autism spectrum disorder, and developmental level were important factors affecting sensitivity and specificity. The results indicate that augmentation of autism spectrum disorder screening by observational items completed by trained pediatric-oriented professionals can be a highly effective tool in improving screening performance. If supported by future population studies, the results suggest that primary care practitioners will be able to be trained to use this direct procedure to augment screening for autism spectrum disorders in the community. © The Author(s) 2013.
[The molecular-cytogenetic characterization and tyrosine kinase inhibitors efficacy in newly diagnosed chronic phase CML patients with variant Philadelphia chromosomes].

Science.gov (United States)

Zhao, J J; Zhang, Y L; Zhang, S J; Zhou, J; Yu, F K; Zu, Y L; Zhao, H F; Li, Z; Song, Y P

2018-03-14

Objective: To investigate the molecular-cytogenetic characterization and impact on tyrosine kinase inhibitors (TKIs) therapy in chronic phase of chronic myeloid leukemia (CML-CP) patients with variant Ph chromosome (vPh). Methods: The clinical data of 32 patients with vPh chromosomes were collected and compared with 703 patients with typical Ph chromosome in newly diagnosed CML-CP who were on first-line imatinib (IM) and with BCR-ABL transcript of P210. Results: There was no significant difference in demographic and hematological characteristics between vPh and classic Ph patients. 3(9.4%) of the 32 vPh cases were simple variant translocations. Among the remaining 29 cases with complex variant translocations, 28 cases (87.5%) involved 3 chromosomes, and only 1 (3.1%) involved 4 chromosomes. Except for 8, 15, 18, X, and Y chromosomes, the other chromosomes were involved. The frequency of chromosome 12q(15.5%) and 1p (12.1%) were higher involved. The most common FISH signal pattern was 2G2R1Y (74.1%), followed by 1G1R2F (14.8%), 2G1R1Y (3.7%), 1G2R1Y (3.7%), 1G1R1Y (3.7%). The comparison of complete cytogenetic response (CCyR) ( P =0.269), major molecular response (MMR) ( P =0.391) were carried out between simple and complex mechanisms, without difference. Compared with the classic Ph, the patients with vPh had higher IM primary resistance rate ( χ 2 =3.978, P =0.046), especially primary hematological resistance ( χ 2 =7.870, P =0.005), but the difference of CCyR ( χ 2 =0.192, P =0.661), MMR ( χ 2 =0.822, P =0.365), EFS ( χ 2 =0.509, P =0.476), OS ( χ 2 =3.485, P =0.062) were not statistically significant, and multivariate analysis showed that the presence of vPh did not affect OS ( RR =0.692, 95% CI 0.393-1.765, P =0.658)、EFS ( RR =0.893, 95% CI 0.347-2.132, P =0.126) and PFS ( RR =1.176, 95% CI 0.643-2.682, P =0.703). Conclusion: CML-CP patients with vPh and classic Ph had similar demographic and hematological characteristics. Except for 22q11, 9q34, the
Probabilistic Approaches to Examining Linguistic Features of Test Items and Their Effect on the Performance of English Language Learners

Science.gov (United States)

Solano-Flores, Guillermo

2014-01-01

This article addresses validity and fairness in the testing of English language learners (ELLs)--students in the United States who are developing English as a second language. It discusses limitations of current approaches to examining the linguistic features of items and their effect on the performance of ELL students. The article submits that…
The optimal sequence and selection of screening test items to predict fall risk in older disabled women: the Women's Health and Aging Study.

Science.gov (United States)

Lamb, Sarah E; McCabe, Chris; Becker, Clemens; Fried, Linda P; Guralnik, Jack M

2008-10-01

Falls are a major cause of disability, dependence, and death in older people. Brief screening algorithms may be helpful in identifying risk and leading to more detailed assessment. Our aim was to determine the most effective sequence of falls screening test items from a wide selection of recommended items including self-report and performance tests, and to compare performance with other published guidelines. Data were from a prospective, age-stratified, cohort study. Participants were 1002 community-dwelling women aged 65 years old or older, experiencing at least some mild disability. Assessments of fall risk factors were conducted in participants' homes. Fall outcomes were collected at 6 monthly intervals. Algorithms were built for prediction of any fall over a 12-month period using tree classification with cross-set validation. Algorithms using performance tests provided the best prediction of fall events, and achieved moderate to strong performance when compared to commonly accepted benchmarks. The items selected by the best performing algorithm were the number of falls in the last year and, in selected subpopulations, frequency of difficulty balancing while walking, a 4 m walking speed test, body mass index, and a test of knee extensor strength. The algorithm performed better than that from the American Geriatric Society/British Geriatric Society/American Academy of Orthopaedic Surgeons and other guidance, although these findings should be treated with caution. Suggestions are made on the type, number, and sequence of tests that could be used to maximize estimation of the probability of falling in older disabled women.
A Bifactor Multidimensional Item Response Theory Model for Differential Item Functioning Analysis on Testlet-Based Items

Science.gov (United States)

Fukuhara, Hirotaka; Kamata, Akihito

2011-01-01

A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into…

Propriedades psicométricas dos itens do teste WISC-III Propiedades psicométricas de los ítenes del subtest WISC-III Psychometric properties of WISC-III items

Directory of Open Access Journals (Sweden)

Vera Lúcia Marques de Figueiredo

2008-09-01

Full Text Available O aperfeiçoamento de um teste se dá através da seleção, substituição ou revisão de itens, e quando um item é analisado, aumenta a validade e precisão do teste. Este artigo trata da apresentação dos resultados relativos às propriedades psicométricas dos itens dos subtestes do WISC-III, referentes a dificuldade, discriminação e validade. O WISC-III é um instrumento amplamente utilizado no contexto da avaliação da inteligência, e conhecer a qualidade dos itens é essencial ao profissional que administra o teste. As análises foram efetuadas com base nas pontuações de 801 protocolos do teste, aplicados por ocasião da pesquisa de adaptação a um contexto brasileiro. As análises mostraram que os itens adaptados apresentaram características psicométricas adequadas, possibilitando a utilização do instrumento como meio confiável de diagnóstico.El perfeccionamiento de un teste ocurre por la selección, sustitución o revisión de ítenes y, cuando un item es analisado, aumenta la validez y fiabilidad del teste. Ese artículo trata de la presentación de los resultados relativos a las propiedades psicométricas de los ítenes del subtest WISC-III, referentes a la dificultad, a la discriminación y a la validez. El WISC-III es un instrumento muy utilizado en el contexto de la evaluación de la inteligencia, y conocer a la calidad de los ítenes es esencial al profesional que administra el teste. Los análisis fueron efectuados con base el los puntajes de 801 protocolos de registro del teste, aplicados por ocasión de encuesta de estandarización a un contexto brasileño. Los análisis enseñaron que los ítenes adaptados apuntaron características psicométricas adecuadas, permitiendo la utilización del instrumento como medio confiable de diagnóstico.The improvement of the quality of items by selection, substitution and review will increase a test's validity and reliability. Current essay will present results referring to
Methodology for the development and calibration of the SCI-QOL item banks.

Science.gov (United States)

Tulsky, David S; Kisala, Pamela A; Victorson, David; Choi, Seung W; Gershon, Richard; Heinemann, Allen W; Cella, David

2015-05-01

To develop a comprehensive, psychometrically sound, and conceptually grounded patient reported outcomes (PRO) measurement system for individuals with spinal cord injury (SCI). Individual interviews (n=44) and focus groups (n=65 individuals with SCI and n=42 SCI clinicians) were used to select key domains for inclusion and to develop PRO items. Verbatim items from other cutting-edge measurement systems (i.e. PROMIS, Neuro-QOL) were included to facilitate linkage and cross-population comparison. Items were field tested in a large sample of individuals with traumatic SCI (n=877). Dimensionality was assessed with confirmatory factor analysis. Local item dependence and differential item functioning were assessed, and items were calibrated using the item response theory (IRT) graded response model. Finally, computer adaptive tests (CATs) and short forms were administered in a new sample (n=245) to assess test-retest reliability and stability. A calibration sample of 877 individuals with traumatic SCI across five SCI Model Systems sites and one Department of Veterans Affairs medical center completed SCI-QOL items in interview format. We developed 14 unidimensional calibrated item banks and 3 calibrated scales across physical, emotional, and social health domains. When combined with the five Spinal Cord Injury--Functional Index physical function banks, the final SCI-QOL system consists of 22 IRT-calibrated item banks/scales. Item banks may be administered as CATs or short forms. Scales may be administered in a fixed-length format only. The SCI-QOL measurement system provides SCI researchers and clinicians with a comprehensive, relevant and psychometrically robust system for measurement of physical-medical, physical-functional, emotional, and social outcomes. All SCI-QOL instruments are freely available on Assessment CenterSM.
Single-Item Measurement of Suicidal Behaviors: Validity and Consequences of Misclassification.

Directory of Open Access Journals (Sweden)

Alexander J Millner

Full Text Available Suicide is a leading cause of death worldwide. Although research has made strides in better defining suicidal behaviors, there has been less focus on accurate measurement. Currently, the widespread use of self-report, single-item questions to assess suicide ideation, plans and attempts may contribute to measurement problems and misclassification. We examined the validity of single-item measurement and the potential for statistical errors. Over 1,500 participants completed an online survey containing single-item questions regarding a history of suicidal behaviors, followed by questions with more precise language, multiple response options and narrative responses to examine the validity of single-item questions. We also conducted simulations to test whether common statistical tests are robust against the degree of misclassification produced by the use of single-items. We found that 11.3% of participants that endorsed a single-item suicide attempt measure engaged in behavior that would not meet the standard definition of a suicide attempt. Similarly, 8.8% of those who endorsed a single-item measure of suicide ideation endorsed thoughts that would not meet standard definitions of suicide ideation. Statistical simulations revealed that this level of misclassification substantially decreases statistical power and increases the likelihood of false conclusions from statistical tests. Providing a wider range of response options for each item reduced the misclassification rate by approximately half. Overall, the use of single-item, self-report questions to assess the presence of suicidal behaviors leads to misclassification, increasing the likelihood of statistical decision errors. Improving the measurement of suicidal behaviors is critical to increase understanding and prevention of suicide.
Item-nonspecific proactive interference in monkeys' auditory short-term memory.

Science.gov (United States)

Bigelow, James; Poremba, Amy

2015-09-01

Recent studies using the delayed matching-to-sample (DMS) paradigm indicate that monkeys' auditory short-term memory (STM) is susceptible to proactive interference (PI). During the task, subjects must indicate whether sample and test sounds separated by a retention interval are identical (match) or not (nonmatch). If a nonmatching test stimulus also occurred on a previous trial, monkeys are more likely to incorrectly make a "match" response (item-specific PI). However, it is not known whether PI may be caused by sounds presented on prior trials that are similar, but nonidentical to the current test stimulus (item-nonspecific PI). This possibility was investigated in two experiments. In Experiment 1, memoranda for each trial comprised tones with a wide range of frequencies, thus minimizing item-specific PI and producing a range of frequency differences among nonidentical tones. In Experiment 2, memoranda were drawn from a set of eight artificial sounds that differed from each other by one, two, or three acoustic dimensions (frequency, spectral bandwidth, and temporal dynamics). Results from both experiments indicate that subjects committed more errors when previously-presented sounds were acoustically similar (though not identical) to the test stimulus of the current trial. Significant effects were produced only by stimuli from the immediately previous trial, suggesting that item-nonspecific PI is less perseverant than item-specific PI, which can extend across noncontiguous trials. Our results contribute to existing human and animal STM literature reporting item-nonspecific PI caused by perceptual similarity among memoranda. Together, these observations underscore the significance of both temporal and discriminability factors in monkeys' STM. Copyright © 2015 Elsevier B.V. All rights reserved.
Medial temporal lobe contributions to cued retrieval of items and contexts.

Science.gov (United States)

Hannula, Deborah E; Libby, Laura A; Yonelinas, Andrew P; Ranganath, Charan

2013-10-01

Several models have proposed that different regions of the medial temporal lobes contribute to different aspects of episodic memory. For instance, according to one view, the perirhinal cortex represents specific items, parahippocampal cortex represents information regarding the context in which these items were encountered, and the hippocampus represents item-context bindings. Here, we used event-related functional magnetic resonance imaging (fMRI) to test a specific prediction of this model-namely, that successful retrieval of items from context cues will elicit perirhinal recruitment and that successful retrieval of contexts from item cues will elicit parahippocampal cortex recruitment. Retrieval of the bound representation in either case was expected to elicit hippocampal engagement. To test these predictions, we had participants study several item-context pairs (i.e., pictures of objects and scenes, respectively), and then had them attempt to recall items from associated context cues and contexts from associated item cues during a scanned retrieval session. Results based on both univariate and multivariate analyses confirmed a role for hippocampus in content-general relational memory retrieval, and a role for parahippocampal cortex in successful retrieval of contexts from item cues. However, we also found that activity differences in perirhinal cortex were correlated with successful cued recall for both items and contexts. These findings provide partial support for the above predictions and are discussed with respect to several models of medial temporal lobe function. Copyright © 2013 Elsevier Ltd. All rights reserved.
Medial Temporal Lobe Contributions to Cued Retrieval of Items and Contexts

Science.gov (United States)

Hannula, Deborah E.; Libby, Laura A.; Yonelinas, Andrew P.; Ranganath, Charan

2013-01-01

Several models have proposed that different regions of the medial temporal lobes contribute to different aspects of episodic memory. For instance, according to one view, the perirhinal cortex represents specific items, parahippocampal cortex represents information regarding the context in which these items were encountered, and the hippocampus represents item-context bindings. Here, we used event-related functional magnetic resonance imaging (fMRI) to test a specific prediction of this model – namely, that successful retrieval of items from context cues will elicit perirhinal recruitment and that successful retrieval of contexts from item cues will elicit parahippocampal cortex recruitment. Retrieval of the bound representation in either case was expected to elicit hippocampal engagement. To test these predictions, we had participants study several item-context pairs (i.e., pictures of objects and scenes, respectively), and then had them attempt to recall items from associated context cues and contexts from associated item cues during a scanned retrieval session. Results based on both univariate and multivariate analyses confirmed a role for hippocampus in content-general relational memory retrieval, and a role for parahippocampal cortex in successful retrieval of contexts from item cues. However, we also found that activity differences in perirhinal cortex were correlated with successful cued recall for both items and contexts. These findings provide partial support for the above predictions and are discussed with respect to several models of medial temporal lobe function. PMID:23466350
Missouri Assessment Program (MAP), Spring 2000: Secondary Science, Released Items, Grade 10.

Science.gov (United States)

Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

This assessment sample provides information on the Missouri Assessment Program (MAP) for grade 10 science. The sample consists of six items taken from the test booklet and scoring guides for the six items. The items assess ecosystems, mechanics, and data analysis. (MM)
Item Construction and Psychometric Models Appropriate for Constructed Responses

Science.gov (United States)

1991-08-01

which involve only one attribute per item. This is especially true when we are dealing with constructed-response items, we have to measure much more...Service University of Ilinois Educacional Testing Service Rosedal Road Capign. IL 61801 Princeton. K3 08541 Princeton. N3 08541 Dr. Charles LeiS Dr
Using classical test theory, item response theory, and Rasch measurement theory to evaluate patient-reported outcome measures: a comparison of worked examples.

Science.gov (United States)

Petrillo, Jennifer; Cano, Stefan J; McLeod, Lori D; Coon, Cheryl D

2015-01-01

To provide comparisons and a worked example of item- and scale-level evaluations based on three psychometric methods used in patient-reported outcome development-classical test theory (CTT), item response theory (IRT), and Rasch measurement theory (RMT)-in an analysis of the National Eye Institute Visual Functioning Questionnaire (VFQ-25). Baseline VFQ-25 data from 240 participants with diabetic macular edema from a randomized, double-masked, multicenter clinical trial were used to evaluate the VFQ at the total score level. CTT, RMT, and IRT evaluations were conducted, and results were assessed in a head-to-head comparison. Results were similar across the three methods, with IRT and RMT providing more detailed diagnostic information on how to improve the scale. CTT led to the identification of two problematic items that threaten the validity of the overall scale score, sets of redundant items, and skewed response categories. IRT and RMT additionally identified poor fit for one item, many locally dependent items, poor targeting, and disordering of over half the response categories. Selection of a psychometric approach depends on many factors. Researchers should justify their evaluation method and consider the intended audience. If the instrument is being developed for descriptive purposes and on a restricted budget, a cursory examination of the CTT-based psychometric properties may be all that is possible. In a high-stakes situation, such as the development of a patient-reported outcome instrument for consideration in pharmaceutical labeling, however, a thorough psychometric evaluation including IRT or RMT should be considered, with final item-level decisions made on the basis of both quantitative and qualitative results. Copyright © 2015. Published by Elsevier Inc.
Order information and free recall: evaluating the item-order hypothesis.

Science.gov (United States)

Mulligan, Neil W; Lozito, Jeffrey P

2007-05-01

The item-order hypothesis proposes that order information plays an important role in recall from long-term memory, and it is commonly used to account for the moderating effects of experimental design in memory research. Recent research (Engelkamp, Jahn, & Seiler, 2003; McDaniel, DeLosh, & Merritt, 2000) raises questions about the assumptions underlying the item-order hypothesis. Four experiments tested these assumptions by examining the relationship between free recall and order memory for lists of varying length (8, 16, or 24 unrelated words or pictures). Some groups were given standard free-recall instructions, other groups were explicitly instructed to use order information in free recall, and other groups were given free-recall tests intermixed with tests of order memory (order reconstruction). The results for short lists were consistent with the assumptions of the item-order account. For intermediate-length lists, explicit order instructions and intermixed order tests made recall more reliant on order information, but under standard conditions, order information played little role in recall. For long lists, there was little evidence that order information contributed to recall. In sum, the assumptions of the item-order account held for short lists, received mixed support with intermediate lists, and received no support for longer lists.
Effectiveness of Item Response Theory (IRT) Proficiency Estimation Methods under Adaptive Multistage Testing. Research Report. ETS RR-15-11

Science.gov (United States)

Kim, Sooyeon; Moses, Tim; Yoo, Hanwook Henry

2015-01-01

The purpose of this inquiry was to investigate the effectiveness of item response theory (IRT) proficiency estimators in terms of estimation bias and error under multistage testing (MST). We chose a 2-stage MST design in which 1 adaptation to the examinees' ability levels takes place. It includes 4 modules (1 at Stage 1, 3 at Stage 2) and 3 paths…
Linguistic Simplification of Mathematics Items: Effects for Language Minority Students in Germany

Science.gov (United States)

Haag, Nicole; Heppt, Birgit; Roppelt, Alexander; Stanat, Petra

2015-01-01

In large-scale assessment studies, language minority students typically obtain lower test scores in mathematics than native speakers. Although this performance difference was related to the linguistic complexity of test items in some studies, other studies did not find linguistically demanding math items to be disproportionally more difficult for…
Effect Size Measures for Differential Item Functioning in a Multidimensional IRT Model

Science.gov (United States)

Suh, Youngsuk

2016-01-01

This study adapted an effect size measure used for studying differential item functioning (DIF) in unidimensional tests and extended the measure to multidimensional tests. Two effect size measures were considered in a multidimensional item response theory model: signed weighted P-difference and unsigned weighted P-difference. The performance of…
Automated Scoring of Short-Answer Open-Ended GRE® Subject Test Items. ETS GRE® Board Research Report No. 04-02. ETS RR-08-20

Science.gov (United States)

Attali, Yigal; Powers, Don; Freedman, Marshall; Harrison, Marissa; Obetz, Susan

2008-01-01

This report describes the development, administration, and scoring of open-ended variants of GRE® Subject Test items in biology and psychology. These questions were administered in a Web-based experiment to registered examinees of the respective Subject Tests. The questions required a short answer of 1-3 sentences, and responses were automatically…
The Dif Identification in Constructed Response Items Using Partial Credit Model

OpenAIRE

Heri Retnawati

2017-01-01

The study was to identify the load, the type and the significance of differential item functioning (DIF) in constructed response item using the partial credit model (PCM). The data in the study were the students’ instruments and the students’ responses toward the PISA-like test items that had been completed by 386 ninth grade students and 460 tenth grade students who had been about 15 years old in the Province of Yogyakarta Special Region in Indonesia. The analysis toward the item characteris...
An Item Response Theory-Based, Computerized Adaptive Testing Version of the MacArthur-Bates Communicative Development Inventory: Words & Sentences (CDI:WS)

Science.gov (United States)

Makransky, Guido; Dale, Philip S.; Havmose, Philip; Bleses, Dorthe

2016-01-01

Purpose: This study investigated the feasibility and potential validity of an item response theory (IRT)-based computerized adaptive testing (CAT) version of the MacArthur-Bates Communicative Development Inventory: Words & Sentences (CDI:WS; Fenson et al., 2007) vocabulary checklist, with the objective of reducing length while maintaining…
Sources of interference in item and associative recognition memory.

Science.gov (United States)

Osth, Adam F; Dennis, Simon

2015-04-01

A powerful theoretical framework for exploring recognition memory is the global matching framework, in which a cue's memory strength reflects the similarity of the retrieval cues being matched against the contents of memory simultaneously. Contributions at retrieval can be categorized as matches and mismatches to the item and context cues, including the self match (match on item and context), item noise (match on context, mismatch on item), context noise (match on item, mismatch on context), and background noise (mismatch on item and context). We present a model that directly parameterizes the matches and mismatches to the item and context cues, which enables estimation of the magnitude of each interference contribution (item noise, context noise, and background noise). The model was fit within a hierarchical Bayesian framework to 10 recognition memory datasets that use manipulations of strength, list length, list strength, word frequency, study-test delay, and stimulus class in item and associative recognition. Estimates of the model parameters revealed at most a small contribution of item noise that varies by stimulus class, with virtually no item noise for single words and scenes. Despite the unpopularity of background noise in recognition memory models, background noise estimates dominated at retrieval across nearly all stimulus classes with the exception of high frequency words, which exhibited equivalent levels of context noise and background noise. These parameter estimates suggest that the majority of interference in recognition memory stems from experiences acquired before the learning episode. (c) 2015 APA, all rights reserved).
Difference in method of administration did not significantly impact item response

DEFF Research Database (Denmark)

Bjorner, Jakob B; Rose, Matthias; Gandek, Barbara

2014-01-01

assistant (PDA), or personal computer (PC) on the Internet, and a second form by PC, in the same administration. Structural invariance, equivalence of item responses, and measurement precision were evaluated using confirmatory factor analysis and item response theory methods. RESULTS: Multigroup...... levels in IVR, PQ, or PDA administration as compared to PC. Availability of large item response theory-calibrated PROMIS item banks allowed for innovations in study design and analysis.......PURPOSE: To test the impact of method of administration (MOA) on the measurement characteristics of items developed in the Patient-Reported Outcomes Measurement Information System (PROMIS). METHODS: Two non-overlapping parallel 8-item forms from each of three PROMIS domains (physical function...
Glycotoxin and Autoantibodies Are Additive Environmentally Determined Predictors of Type 1 Diabetes

Science.gov (United States)

Beyan, Huriya; Riese, Harriette; Hawa, Mohammed I.; Beretta, Guisi; Davidson, Howard W.; Hutton, John C.; Burger, Huibert; Schlosser, Michael; Snieder, Harold; Boehm, Bernhard O.; Leslie, R. David

2012-01-01

In type 1 diabetes, diabetes-associated autoantibodies, including islet cell antibodies (ICAs), reflect adaptive immunity, while increased serum Nε-carboxymethyl-lysine (CML), an advanced glycation end product, is associated with proinflammation. We assessed whether serum CML and autoantibodies predicted type 1 diabetes and to what extent they were determined by genetic or environmental factors. Of 7,287 unselected schoolchildren screened, 115 were ICA+ and were tested for baseline CML and diabetes autoantibodies and followed (for median 7 years), whereas a random selection (n = 2,102) had CML tested. CML and diabetes autoantibodies were determined in a classic twin study of twin pairs discordant for type 1 diabetes (32 monozygotic, 32 dizygotic pairs). CML was determined by enzyme-linked immunosorbent assay, autoantibodies were determined by radioimmunoprecipitation, ICA was determined by indirect immunofluorescence, and HLA class II genotyping was determined by sequence-specific oligonucleotides. CML was increased in ICA+ and prediabetic schoolchildren and in diabetic and nondiabetic twins (all P < 0.001). Elevated levels of CML in ICA+ children were a persistent, independent predictor of diabetes progression, in addition to autoantibodies and HLA risk. In twins model fitting, familial environment explained 75% of CML variance, and nonshared environment explained all autoantibody variance. Serum CML, a glycotoxin, emerged as an environmentally determined diabetes risk factor, in addition to autoimmunity and HLA genetic risk, and a potential therapeutic target. PMID:22396204
Avanços na psicometria: da Teoria Clássica dos Testes à Teoria de Resposta ao Item

Directory of Open Access Journals (Sweden)

Laisa Marcorela Andreoli Sartes

2013-01-01

Full Text Available No século XX, o desenvolvimento e avaliação das propriedades psicométricas dos testes se embasou principalmente na Teoria Clássica dos Testes (TCT. Muitos testes são longos e redundantes, com medidas influenciáveis pelas características da amostra dos indivíduos avaliados durante seu desenvolvimento, sendo algumas destas limitações consequências do uso da TCT. A Teoria de Resposta ao Item (TRI surgiu como uma possível solução para algumas limitações da TCT, melhorando a qualidade da avaliação da estrutura dos testes. Neste texto comparamos criticamente as características da TCT e da TRI como métodos para avaliação das propriedades psicométricas dos testes. São discutidas as vantagens e limitações de cada método.

RhoA: A therapeutic target for chronic myeloid leukemia

Directory of Open Access Journals (Sweden)

Molli Poonam R

2012-03-01

Full Text Available Abstract Background Chronic Myeloid Leukemia (CML is a malignant pluripotent stem cells disorder of myeloid cells. In CML patients, polymorphonuclear leukocytes (PMNL the terminally differentiated cells of myeloid series exhibit defects in several actin dependent functions such as adhesion, motility, chemotaxis, agglutination, phagocytosis and microbicidal activities. A definite and global abnormality was observed in stimulation of actin polymerization in CML PMNL. Signalling molecules ras and rhoGTPases regulate spatial and temporal polymerization of actin and thus, a broad range of physiological processes. Therefore, status of these GTPases as well as actin was studied in resting and fMLP stimulated normal and CML PMNL. Methods To study expression of GTPases and actin, Western blotting and flow cytometry analysis were done, while spatial expression and colocalization of these proteins were studied by using laser confocal microscopy. To study effect of inhibitors on cell proliferation CCK-8 assay was done. Significance of differences in expression of proteins within the samples and between normal and CML was tested by using Wilcoxon signed rank test and Mann-Whitney test, respectively. Bivariate and partial correlation analyses were done to study relationship between all the parameters. Results In CML PMNL, actin expression and its architecture were altered and stimulation of actin polymerization was absent. Differences were also observed in expression, organization or stimulation of all the three GTPases in normal and CML PMNL. In normal PMNL, ras was the critical GTPase regulating expression of rhoGTPases and actin and actin polymerization. But in CML PMNL, rhoA took a central place. In accordance with these, treatment with rho/ROCK pathway inhibitors resulted in specific growth inhibition of CML cell lines. Conclusions RhoA has emerged as the key molecule responsible for functional defects in CML PMNL and therefore can be used as a
Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar.

Science.gov (United States)

Crane, Paul K; Gibbons, Laura E; Jolley, Lance; van Belle, Gerald

2006-11-01

We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic regression models are applied to each item. Model testing begins with examination of the statistical significance of the interaction term between ability and the group indicator, consistent with nonuniform DIF. Then we turn our attention to the coefficient of the ability term in models with and without the group term. If including the group term has a marked effect on that coefficient, we declare that it has uniform DIF. We examined DIF related to language of test administration in addition to self-reported race, Hispanic ethnicity, age, years of education, and sex. We used PARSCALE for IRT analyses and STATA for ordinal logistic regression approaches. We used an iterative technique for adjusting IRT ability estimates on the basis of DIF findings. Five items were found to have DIF related to language. These same items also had DIF related to other covariates. The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for DIF detection. There appear to be several items with significant DIF related to language of test administration in the MMSE. More attention needs to be paid to the specific criteria used to determine whether an item has DIF, not just the technique used to identify DIF.
Specificity and false positive rates of the Test of Memory Malingering, Rey 15-item Test, and Rey Word Recognition Test among forensic inpatients with intellectual disabilities.

Science.gov (United States)

Love, Christopher M; Glassmire, David M; Zanolini, Shanna Jordan; Wolf, Amanda

2014-10-01

This study evaluated the specificity and false positive (FP) rates of the Rey 15-Item Test (FIT), Word Recognition Test (WRT), and Test of Memory Malingering (TOMM) in a sample of 21 forensic inpatients with mild intellectual disability (ID). The FIT demonstrated an FP rate of 23.8% with the standard quantitative cutoff score. Certain qualitative error types on the FIT showed promise and had low FP rates. The WRT obtained an FP rate of 0.0% with previously reported cutoff scores. Finally, the TOMM demonstrated low FP rates of 4.8% and 0.0% on Trial 2 and the Retention Trial, respectively, when applying the standard cutoff score. FP rates are reported for a range of cutoff scores and compared with published research on individuals diagnosed with ID. Results indicated that although the quantitative variables on the FIT had unacceptably high FP rates, the TOMM and WRT had low FP rates, increasing the confidence clinicians can place in scores reflecting poor effort on these measures during ID evaluations. © The Author(s) 2014.
Assessment of the psychometrics of a PROMIS item bank: self-efficacy for managing daily activities.

Science.gov (United States)

Hong, Ickpyo; Velozo, Craig A; Li, Chih-Ying; Romero, Sergio; Gruber-Baldini, Ann L; Shulman, Lisa M

2016-09-01

The aim of this study is to investigate the psychometrics of the Patient-Reported Outcomes Measurement Information System self-efficacy for managing daily activities item bank. The item pool was field tested on a sample of 1087 participants via internet (n = 250) and in-clinic (n = 837) surveys. All participants reported having at least one chronic health condition. The 35 item pool was investigated for dimensionality (confirmatory factor analyses, CFA and exploratory factor analysis, EFA), item-total correlations, local independence, precision, and differential item functioning (DIF) across gender, race, ethnicity, age groups, data collection modes, and neurological chronic conditions (McFadden Pseudo R (2) less than 10 %). The item pool met two of the four CFA fit criteria (CFI = 0.952 and SRMR = 0.07). EFA analysis found a dominant first factor (eigenvalue = 24.34) and the ratio of first to second eigenvalue was 12.4. The item pool demonstrated good item-total correlations (0.59-0.85) and acceptable internal consistency (Cronbach's alpha = 0.97). The item pool maintained its precision (reliability over 0.90) across a wide range of theta (3.70), and there was no significant DIF. The findings indicated the item pool has sound psychometric properties and the test items are eligible for development of computerized adaptive testing and short forms.
Linking Existing Instruments to Develop an Activity of Daily Living Item Bank.

Science.gov (United States)

Li, Chih-Ying; Romero, Sergio; Bonilha, Heather S; Simpson, Kit N; Simpson, Annie N; Hong, Ickpyo; Velozo, Craig A

2018-03-01

This study examined dimensionality and item-level psychometric properties of an item bank measuring activities of daily living (ADL) across inpatient rehabilitation facilities and community living centers. Common person equating method was used in the retrospective veterans data set. This study examined dimensionality, model fit, local independence, and monotonicity using factor analyses and fit statistics, principal component analysis (PCA), and differential item functioning (DIF) using Rasch analysis. Following the elimination of invalid data, 371 veterans who completed both the Functional Independence Measure (FIM) and minimum data set (MDS) within 6 days were retained. The FIM-MDS item bank demonstrated good internal consistency (Cronbach's α = .98) and met three rating scale diagnostic criteria and three of the four model fit statistics (comparative fit index/Tucker-Lewis index = 0.98, root mean square error of approximation = 0.14, and standardized root mean residual = 0.07). PCA of Rasch residuals showed the item bank explained 94.2% variance. The item bank covered the range of θ from -1.50 to 1.26 (item), -3.57 to 4.21 (person) with person strata of 6.3. The findings indicated the ADL physical function item bank constructed from FIM and MDS measured a single latent trait with overall acceptable item-level psychometric properties, suggesting that it is an appropriate source for developing efficient test forms such as short forms and computerized adaptive tests.
INVESTIGATION OF MIS ITEM 011589A AND 3013 CONTAINERS HAVING SIMILAR CHARACTERISTICS

Energy Technology Data Exchange (ETDEWEB)

Friday, G

2006-08-23

Recent testing has identified the presence of hydrogen and oxygen in MIS Item 011589A. This isolated observation has effectuated concern regarding the potential for flammable gas mixtures in containers in the storage inventory. This study examines the known physicochemical characteristics of MIS Item 011589A and queries the ISP Database for items that are most similar or potentially similar. Items identified as most similar are believed to have the highest probability of being chemically and structurally identical to MIS Item 011589A. Items identified as potentially like MIS Item 011589A have some attributes in common, have the potential to generate gases, but have a lower probability of having similar gas generating characteristics. MIS Item 011589A is an oxide that was generated prior to 1990 at Rocky Flats in Building 707. It was associated with foundry processing and had an actinide assay of approximately 77%. Prompt gamma analysis of MIS Item 011589A indicated the presence of chloride, fluorine, magnesium, sodium, and aluminum. Queries based on MIS representation classification and process of origin were applied to the ISP Database. Evaluation criteria included binning classification (i.e., innocuous, pressure, or pressure and corrosion), availability of prompt gamma analyses, presence of chlorine and magnesium, percentage of chlorine by weight, peak ratios (i.e., Na:Cl and Mg:Na), moisture, and percent assay. These queries identified 15 items that were most similar and 106 items that were potentially like MIS Item 011589A. Although these queries identified containers that could potentially generate flammable gases, verification and confirmation can only be accomplished by destructive evaluation and testing of containers from the storage inventory.
Using Item Response Theory to Develop a 60-Item Representation of the NEO PI-R Using the International Personality Item Pool: Development of the IPIP-NEO-60.

Science.gov (United States)

Maples-Keller, Jessica L; Williamson, Rachel L; Sleep, Chelsea E; Carter, Nathan T; Campbell, W Keith; Miller, Joshua D

2017-10-31

Given advantages of freely available and modifiable measures, an increase in the use of measures developed from the International Personality Item Pool (IPIP), including the 300-item representation of the Revised NEO Personality Inventory (NEO PI-R; Costa & McCrae, 1992a ) has occurred. The focus of this study was to use item response theory to develop a 60-item, IPIP-based measure of the Five-Factor Model (FFM) that provides equal representation of the FFM facets and to test the reliability and convergent and criterion validity of this measure compared to the NEO Five Factor Inventory (NEO-FFI). In an undergraduate sample (n = 359), scores from the NEO-FFI and IPIP-NEO-60 demonstrated good reliability and convergent validity with the NEO PI-R and IPIP-NEO-300. Additionally, across criterion variables in the undergraduate sample as well as a community-based sample (n = 757), the NEO-FFI and IPIP-NEO-60 demonstrated similar nomological networks across a wide range of external variables (r ICC = .96). Finally, as expected, in an MTurk sample the IPIP-NEO-60 demonstrated advantages over the Big Five Inventory-2 (Soto & John, 2017 ; n = 342) with regard to the Agreeableness domain content. The results suggest strong reliability and validity of the IPIP-NEO-60 scores.
Development of six PROMIS pediatrics proxy-report item banks.

Science.gov (United States)

Irwin, Debra E; Gross, Heather E; Stucky, Brian D; Thissen, David; DeWitt, Esi Morgan; Lai, Jin Shei; Amtmann, Dagmar; Khastou, Leyla; Varni, James W; DeWalt, Darren A

2012-02-22

Pediatric self-report should be considered the standard for measuring patient reported outcomes (PRO) among children. However, circumstances exist when the child is too young, cognitively impaired, or too ill to complete a PRO instrument and a proxy-report is needed. This paper describes the development process including the proxy cognitive interviews and large-field-test survey methods and sample characteristics employed to produce item parameters for the Patient Reported Outcomes Measurement Information System (PROMIS) pediatric proxy-report item banks. The PROMIS pediatric self-report items were converted into proxy-report items before undergoing cognitive interviews. These items covered six domains (physical function, emotional distress, social peer relationships, fatigue, pain interference, and asthma impact). Caregivers (n = 25) of children ages of 5 and 17 years provided qualitative feedback on proxy-report items to assess any major issues with these items. From May 2008 to March 2009, the large-scale survey enrolled children ages 8-17 years to complete the self-report version and caregivers to complete the proxy-report version of the survey (n = 1548 dyads). Caregivers of children ages 5 to 7 years completed the proxy report survey (n = 432). In addition, caregivers completed other proxy instruments, PedsQL™ 4.0 Generic Core Scales Parent Proxy-Report version, PedsQL™ Asthma Module Parent Proxy-Report version, and KIDSCREEN Parent-Proxy-52. Item content was well understood by proxies and did not require item revisions but some proxies clearly noted that determining an answer on behalf of their child was difficult for some items. Dyads and caregivers of children ages 5-17 years old were enrolled in the large-scale testing. The majority were female (85%), married (70%), Caucasian (64%) and had at least a high school education (94%). Approximately 50% had children with a chronic health condition, primarily asthma, which was diagnosed or treated within 6
The differential item functioning and structural equivalence of a nonverbal cognitive ability test for five language groups

Directory of Open Access Journals (Sweden)

Pieter Schaap

2011-10-01

Research purpose: The aim of the study was to determine the differential item functioning (DIF and structural equivalence of a nonverbal cognitive ability test (the PiB/SpEEx Observance test [401] for five South African language groups. Motivation for study: Cultural and language group sensitive tests can lead to unfair discrimination and is a contentious workplace issue in South Africa today. Misconceptions about psychometric testing in industry can cause tests to lose credibility if industries do not use a scientifically sound test-by-test evaluation approach. Research design, approach and method: The researcher used a quasi-experimental design and factor analytic and logistic regression techniques to meet the research aims. The study used a convenience sample drawn from industry and an educational institution. Main findings: The main findings of the study show structural equivalence of the test at a holistic level and nonsignificant DIF effect sizes for most of the comparisons that the researcher made. Practical/managerial implications: This research shows that the PIB/SpEEx Observance Test (401 is not completely language insensitive. One should see it rather as a language-reduced test when people from different language groups need testing. Contribution/value-add: The findings provide supporting evidence that nonverbal cognitive tests are plausible alternatives to verbal tests when one compares people from different language groups.
FIM-Minimum Data Set Motor Item Bank: Short Forms Development and Precision Comparison in Veterans.

Science.gov (United States)

Li, Chih-Ying; Romero, Sergio; Simpson, Annie N; Bonilha, Heather S; Simpson, Kit N; Hong, Ickpyo; Velozo, Craig A

2018-03-01

To improve the practical use of the short forms (SFs) developed from the item bank, we compared the measurement precision of the 4- and 8-item SFs generated from a motor item bank composed of the FIM and the Minimum Data Set (MDS). The FIM-MDS motor item bank allowed scores generated from different instruments to be co-calibrated. The 4- and 8-item SFs were developed based on Rasch analysis procedures. This article compared person strata, ceiling/floor effects, and test SE plots for each administration form and examined 95% confidence interval error bands of anchored person measures with the corresponding SFs. We used 0.3 SE as a criterion to reflect a reliability level of .90. Veterans' inpatient rehabilitation facilities and community living centers. Veterans (N=2500) who had both FIM and the MDS data within 6 days during 2008 through 2010. Not applicable. Four- and 8-item SFs of FIM, MDS, and FIM-MDS motor item bank. Six SFs were generated with 4 and 8 items across a range of difficulty levels from the FIM-MDS motor item bank. The three 8-item SFs all had higher correlations with the item bank (r=.82-.95), higher person strata, and less test error than the corresponding 4-item SFs (r=.80-.90). The three 4-item SFs did not meet the criteria of SE bank composed of existing instruments across the continuum of care in veterans. We also found that the number of items, not test specificity, determines the precision of the instrument. Copyright © 2017 American Congress of Rehabilitation Medicine. All rights reserved.
The Effects of Item Format and Cognitive Domain on Students' Science Performance in TIMSS 2011

Science.gov (United States)

Liou, Pey-Yan; Bulut, Okan

2017-12-01

The purpose of this study was to examine eighth-grade students' science performance in terms of two test design components, item format, and cognitive domain. The portion of Taiwanese data came from the 2011 administration of the Trends in International Mathematics and Science Study (TIMSS), one of the major international large-scale assessments in science. The item difficulty analysis was initially applied to show the proportion of correct items. A regression-based cumulative link mixed modeling (CLMM) approach was further utilized to estimate the impact of item format, cognitive domain, and their interaction on the students' science scores. The results of the proportion-correct statistics showed that constructed-response items were more difficult than multiple-choice items, and that the reasoning cognitive domain items were more difficult compared to the items in the applying and knowing domains. In terms of the CLMM results, students tended to obtain higher scores when answering constructed-response items as well as items in the applying cognitive domain. When the two predictors and the interaction term were included together, the directions and magnitudes of the predictors on student science performance changed substantially. Plausible explanations for the complex nature of the effects of the two test-design predictors on student science performance are discussed. The results provide practical, empirical-based evidence for test developers, teachers, and stakeholders to be aware of the differential function of item format, cognitive domain, and their interaction in students' science performance.
Introduction to Psychology and Leadership. Part Nine; Morale and Esprit De Corps. Progress Check. Test Item Pool. Segments I & II.

Science.gov (United States)

Westinghouse Learning Corp., Annapolis, MD.

Test items for the introduction to psychology and leadership course (see the final reports which summarize the course development project, EM 010 418, EM 010 419, and EM 010 484) which were compiled as part of the project documentation and which are coordinated with the text-workbook on morale and esprit de corps (EM 010 439, EM 010 440, and EM…
Teste de Raciocínio Auditivo Musical (RAu: estudo inicial por meio da Teoria de Reposta ao Item Test de Raciocinio Auditivo Musical (RAu: estudio inicial a través de la Teoría de Repuesta al Ítem Auditory Musical Reasoning Test: an initial study with Item Response Theory

Directory of Open Access Journals (Sweden)

Fernando Pessotto

2012-12-01

Full Text Available A presente pesquisa tem como objetivo buscar evidências de validade com base na estrutura interna e de critério para um instrumento de avaliação do processamento auditivo das habilidades musicais (Teste de Processamento Auditivo com Estímulos Musicais, RAu. Para tanto, foram avaliadas 162 pessoas de ambos os sexos, sendo 56,8% homens, com faixa etária entre 15 e 59 anos (M=27,5; DP=9,01. Os participantes foram divididos entre músicos (N=24, amadores (N=62 e leigos (N=76, de acordo com o nível de conhecimento em música. Por meio da análise Full Information Factor Analysis, verificou-se a dimensionalidade do instrumento, e também as propriedades dos itens, por meio da Teoria de Resposta ao Item (TRI. Além disso, buscou-se identificar a capacidade de discriminação entre os grupos de músicos e não-músicos. Os dados encontrados apontam evidências de que os itens medem uma dimensão principal (alfa=0,92 com alta capacidade para diferenciar os grupos de músicos profissionais, amadores e leigos, obtendo-se um coeficiente de validade de critério de r=0,68. Os resultado indicam evidências positivas de precisão e validade para o RAu.La presente investigación tiene como objetivo buscar evidencias de validez basadas en la estructura interna y de criterio para un instrumento de evaluación del procesamiento auditivo de las habilidades musicales (Test de Procesamiento Auditivo con Estímulos Musicales, RAu. Para eso, fueron evaluadas 162 personas de ambos los sexos, siendo 56,8% hombres, con rango de edad entre 15 y 59 años (M=27,5; DP=9,01. Los participantes fueron divididos entre músicos (N=24, aficionados (N=62 y laicos (N=76 de acuerdo con el nivel de conocimiento en música. Por medio del análisis Full Information Factor Analysis se verificó la dimensionalidad del instrumento y también las propiedades de los ítems a través de la Teoría de Respuesta al Ítem (TRI. Además, se buscó identificar la capacidad de discriminaci
Development of the PROMIS positive emotional and sensory expectancies of smoking item banks.

Science.gov (United States)

Tucker, Joan S; Shadel, William G; Edelen, Maria Orlando; Stucky, Brian D; Li, Zhen; Hansen, Mark; Cai, Li

2014-09-01

The positive emotional and sensory expectancies of cigarette smoking include improved cognitive abilities, positive affective states, and pleasurable sensorimotor sensations. This paper describes development of Positive Emotional and Sensory Expectancies of Smoking item banks that will serve to standardize the assessment of this construct among daily and nondaily cigarette smokers. Data came from daily (N = 4,201) and nondaily (N =1,183) smokers who completed an online survey. To identify a unidimensional set of items, we conducted item factor analyses, item response theory analyses, and differential item functioning analyses. Additionally, we evaluated the performance of fixed-item short forms (SFs) and computer adaptive tests (CATs) to efficiently assess the construct. Eighteen items were included in the item banks (15 common across daily and nondaily smokers, 1 unique to daily, 2 unique to nondaily). The item banks are strongly unidimensional, highly reliable (reliability = 0.95 for both), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.86). Results from simulated CATs indicated that, on average, less than 8 items are needed to assess the construct with adequate precision using the item banks. These analyses identified a new set of items that can assess the positive emotional and sensory expectancies of smoking in a reliable and standardized manner. Considerable efficiency in assessing this construct can be achieved by using the item bank SF, employing computer adaptive tests, or selecting subsets of items tailored to specific research or clinical purposes. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
The Body Appreciation Scale-2: item refinement and psychometric evaluation.

Science.gov (United States)

Tylka, Tracy L; Wood-Barcalow, Nichole L

2015-01-01

Considered a positive body image measure, the 13-item Body Appreciation Scale (BAS; Avalos, Tylka, & Wood-Barcalow, 2005) assesses individuals' acceptance of, favorable opinions toward, and respect for their bodies. While the BAS has accrued psychometric support, we improved it by rewording certain BAS items (to eliminate sex-specific versions and body dissatisfaction-based language) and developing additional items based on positive body image research. In three studies, we examined the reworded, newly developed, and retained items to determine their psychometric properties among college and online community (Amazon Mechanical Turk) samples of 820 women and 767 men. After exploratory factor analysis, we retained 10 items (five original BAS items). Confirmatory factor analysis upheld the BAS-2's unidimensionality and invariance across sex and sample type. Its internal consistency, test-retest reliability, and construct (convergent, incremental, and discriminant) validity were supported. The BAS-2 is a psychometrically sound positive body image measure applicable for research and clinical settings. Copyright © 2014 Elsevier Ltd. All rights reserved.
The Differences among Three-, Four-, and Five-Option-Item Formats in the Context of a High-Stakes English-Language Listening Test

Science.gov (United States)

Lee, HyeSun; Winke, Paula

2013-01-01

We adapted three practice College Scholastic Ability Tests (CSAT) of English listening, each with five-option items, to create four- and three-option versions by asking 73 Korean speakers or learners of English to eliminate the least plausible options in two rounds. Two hundred and sixty-four Korean high school English-language learners formed…
Adaptive screening for depression--recalibration of an item bank for the assessment of depression in persons with mental and somatic diseases and evaluation in a simulated computer-adaptive test environment.

Science.gov (United States)

Forkmann, Thomas; Kroehne, Ulf; Wirtz, Markus; Norra, Christine; Baumeister, Harald; Gauggel, Siegfried; Elhan, Atilla Halil; Tennant, Alan; Boecker, Maren

2013-11-01

This study conducted a simulation study for computer-adaptive testing based on the Aachen Depression Item Bank (ADIB), which was developed for the assessment of depression in persons with somatic diseases. Prior to computer-adaptive test simulation, the ADIB was newly calibrated. Recalibration was performed in a sample of 161 patients treated for a depressive syndrome, 103 patients from cardiology, and 103 patients from otorhinolaryngology (mean age 44.1, SD=14.0; 44.7% female) and was cross-validated in a sample of 117 patients undergoing rehabilitation for cardiac diseases (mean age 58.4, SD=10.5; 24.8% women). Unidimensionality of the itembank was checked and a Rasch analysis was performed that evaluated local dependency (LD), differential item functioning (DIF), item fit and reliability. CAT-simulation was conducted with the total sample and additional simulated data. Recalibration resulted in a strictly unidimensional item bank with 36 items, showing good Rasch model fit (item fit residualsLD. CAT simulation revealed that 13 items on average were necessary to estimate depression in the range of -2 and +2 logits when terminating at SE≤0.32 and 4 items if using SE≤0.50. Receiver Operating Characteristics analysis showed that θ estimates based on the CAT algorithm have good criterion validity with regard to depression diagnoses (Area Under the Curve≥.78 for all cut-off criteria). The recalibration of the ADIB succeeded and the simulation studies conducted suggest that it has good screening performance in the samples investigated and that it may reasonably add to the improvement of depression assessment. © 2013.
Time-limited effects of emotional arousal on item and source memory.

Science.gov (United States)

Wang, Bo; Sun, Bukuan

2015-01-01

Two experiments investigated the time-limited effects of emotional arousal on consolidation of item and source memory. In Experiment 1, participants memorized words (items) and the corresponding speakers (sources) and then took an immediate free recall test. Then they watched a neutral, positive, or negative video 5, 35, or 50 min after learning, and 24 hours later they took surprise memory tests. Experiment 2 was similar to Experiment 1 except that (a) a reality monitoring task was used; (b) elicitation delays of 5, 30, and 45 min were used; and (c) delayed memory tests were given 60 min after learning. Both experiments showed that, regardless of elicitation delay, emotional arousal did not enhance item recall memory. Second, both experiments showed that negative arousal enhanced delayed item recognition memory only at the medium elicitation delay, but not in the shorter or longer delays. Positive arousal enhanced performance only in Experiment 1. Third, regardless of elicitation delay, emotional arousal had little effect on source memory. These findings have implications for theories of emotion and memory, suggesting that emotion effects are contingent upon the nature of the memory task and elicitation delay.
Instemmingsgeneigdheid en verskillende item- en responsformate in 'n gesommeerde selfbeoordelingskaal

Directory of Open Access Journals (Sweden)

Nadene Hanekom

1998-06-01

Full Text Available This study examines the degree of acquiescence present when the item and response formats of a summated rating scale are varied. It is often recommended that acquiescence response bias in rating scales may be controlled by using both positively and negatively worded items. Such items are generally worded in the Likert-type format of statements. The purpose of the study was to establish whether items in question format would result in a smaller degree of acquiescence than items worded as statements. the response format was also varied (five- and seven-point options to determine whether this would influence the reliability and degree of acquiescence in the scales. A twenty-item Locus of Control (LC questionnaire was used, but each item was complemented by its opposite, resulting in 40 items. The subjects, divided randomly into two groups, were second year students who had to complete four versions of the questionnaire, plus a shortened version of Bass's scale for measuring acquiescence. The LC version were questions or statements each combined with a five- or seven-point respons format. Partial counterbalancing was introduced by testing on two separate occasions, presenting the tests to the two groups in the opposite order. The degree of acquiescence was assessed by correlating the items with their opposite, and by correlating scores on each version with scores on the acquiescence questionnaire. No major difference were found between the various item and response format in relation to acquiescence. Opsomming Hierdie ondersoek is uitgevoer om te bepaal of die mate van instemmingsgeneigdheid deur die item- en responsformaat van 'n gesommeerde selfbeoordelingskaal beinvloed word. Daar word dikwels aanbeveel dat die gebruik van positief- sowel as negatiefbewoorde items in 'n vraelys instemmingsgeneigdheid beperk. Suike items word gewoonlik in die tradisionele Likertformaat as stellings geformuleer. Die doel van die ondersoek was om te bepaal of items
Gender Invariance of the Gambling Behavior Scale for Adolescents (GBS-A): An Analysis of Differential Item Functioning Using Item Response Theory.

Science.gov (United States)

Donati, Maria Anna; Chiesi, Francesca; Izzo, Viola A; Primi, Caterina

2017-01-01

As there is a lack of evidence attesting the equivalent item functioning across genders for the most employed instruments used to measure pathological gambling in adolescence, the present study was aimed to test the gender invariance of the Gambling Behavior Scale for Adolescents (GBS-A), a new measurement tool to assess the severity of Gambling Disorder (GD) in adolescents. The equivalence of the items across genders was assessed by analyzing Differential Item Functioning within an Item Response Theory framework. The GBS-A was administered to 1,723 adolescents, and the graded response model was employed. The results attested the measurement equivalence of the GBS-A when administered to male and female adolescent gamblers. Overall, findings provided evidence that the GBS-A is an effective measurement tool of the severity of GD in male and female adolescents and that the scale was unbiased and able to relieve truly gender differences. As such, the GBS-A can be profitably used in educational interventions and clinical treatments with young people.

Item Difficulty in the Evaluation of Computer-Based Instruction: An Example from Neuroanatomy

Science.gov (United States)

Chariker, Julia H.; Naaz, Farah; Pani, John R.

2012-01-01

This article reports large item effects in a study of computer-based learning of neuroanatomy. Outcome measures of the efficiency of learning, transfer of learning, and generalization of knowledge diverged by a wide margin across test items, with certain sets of items emerging as particularly difficult to master. In addition, the outcomes of…
Differential item functioning magnitude and impact measures from item response theory models.

Science.gov (United States)

Kleinman, Marjorie; Teresi, Jeanne A

2016-01-01

Measures of magnitude and impact of differential item functioning (DIF) at the item and scale level, respectively are presented and reviewed in this paper. Most measures are based on item response theory models. Magnitude refers to item level effect sizes, whereas impact refers to differences between groups at the scale score level. Reviewed are magnitude measures based on group differences in the expected item scores and impact measures based on differences in the expected scale scores. The similarities among these indices are demonstrated. Various software packages are described that provide magnitude and impact measures, and new software presented that computes all of the available statistics conveniently in one program with explanations of their relationships to one another.
A Comparison of Item Exposure Control Procedures with the Generalized Partial Credit Model

Science.gov (United States)

Sanchez, Edgar Isaac

2008-01-01

To enhance test security of high stakes tests, it is vital to understand the way various exposure control strategies function under various IRT models. To that end the present dissertation focused on the performance of several exposure control strategies under the generalized partial credit model with an item pool of 100 and 200 items. These…
ITEM LEVEL DIAGNOSTICS AND MODEL - DATA FIT IN ITEM ...

African Journals Online (AJOL)

Global Journal

Item response theory (IRT) is a framework for modeling and analyzing item response ... data. Though, there is an argument that the evaluation of fit in IRT modeling has been ... National Council on Measurement in Education ... model data fit should be based on three types of ... prediction should be assessed through the.
Development of six PROMIS pediatrics proxy-report item banks

Directory of Open Access Journals (Sweden)

Irwin Debra E

2012-02-01

Full Text Available Abstract Background Pediatric self-report should be considered the standard for measuring patient reported outcomes (PRO among children. However, circumstances exist when the child is too young, cognitively impaired, or too ill to complete a PRO instrument and a proxy-report is needed. This paper describes the development process including the proxy cognitive interviews and large-field-test survey methods and sample characteristics employed to produce item parameters for the Patient Reported Outcomes Measurement Information System (PROMIS pediatric proxy-report item banks. Methods The PROMIS pediatric self-report items were converted into proxy-report items before undergoing cognitive interviews. These items covered six domains (physical function, emotional distress, social peer relationships, fatigue, pain interference, and asthma impact. Caregivers (n = 25 of children ages of 5 and 17 years provided qualitative feedback on proxy-report items to assess any major issues with these items. From May 2008 to March 2009, the large-scale survey enrolled children ages 8-17 years to complete the self-report version and caregivers to complete the proxy-report version of the survey (n = 1548 dyads. Caregivers of children ages 5 to 7 years completed the proxy report survey (n = 432. In addition, caregivers completed other proxy instruments, PedsQL™ 4.0 Generic Core Scales Parent Proxy-Report version, PedsQL™ Asthma Module Parent Proxy-Report version, and KIDSCREEN Parent-Proxy-52. Results Item content was well understood by proxies and did not require item revisions but some proxies clearly noted that determining an answer on behalf of their child was difficult for some items. Dyads and caregivers of children ages 5-17 years old were enrolled in the large-scale testing. The majority were female (85%, married (70%, Caucasian (64% and had at least a high school education (94%. Approximately 50% had children with a chronic health condition, primarily
An Effective Multimedia Item Shell Design for Individualized Education: The Crome Project

Directory of Open Access Journals (Sweden)

Irene Cheng

2008-01-01

Full Text Available There are several advantages to creating multimedia item types and applying computer-based adaptive testing in education. First is the capability to motivate learning by making the learners feel more engaged and in an interactive environment. Second is a better concept representation, which is not possible in conventional multiple-choice tests. Third is the advantage of individualized curriculum design, rather than a curriculum designed for an average student. Fourth is a good choice of the next question, associated with the appropriate difficulty level based on a student's response to the current question. However, many issues need to be addressed when achieving these goals, including: (a the large number of item types required to represent the current multiple-choice questions in multimedia formats, (b the criterion used to determine the difficulty level of a multimedia question item, and (c the methodology applied to the question selection process for individual students. In this paper, we propose a multimedia item shell design that not only reduces the number of item types required, but also computes difficulty level of an item automatically. The concept of question seed is introduced to make content creation more cost-effective. The proposed item shell framework facilitates efficient communication between user responses at the client, and the scoring agents integrated with a student ability assessor at the server. We also describe approaches for automatically estimating difficulty level of questions, and discuss preliminary evaluation of multimedia item types by students.
Lord-Wingersky Algorithm Version 2.0 for Hierarchical Item Factor Models with Applications in Test Scoring, Scale Alignment, and Model Fit Testing.

Science.gov (United States)

Cai, Li

2015-06-01

Lord and Wingersky's (Appl Psychol Meas 8:453-461, 1984) recursive algorithm for creating summed score based likelihoods and posteriors has a proven track record in unidimensional item response theory (IRT) applications. Extending the recursive algorithm to handle multidimensionality is relatively simple, especially with fixed quadrature because the recursions can be defined on a grid formed by direct products of quadrature points. However, the increase in computational burden remains exponential in the number of dimensions, making the implementation of the recursive algorithm cumbersome for truly high-dimensional models. In this paper, a dimension reduction method that is specific to the Lord-Wingersky recursions is developed. This method can take advantage of the restrictions implied by hierarchical item factor models, e.g., the bifactor model, the testlet model, or the two-tier model, such that a version of the Lord-Wingersky recursive algorithm can operate on a dramatically reduced set of quadrature points. For instance, in a bifactor model, the dimension of integration is always equal to 2, regardless of the number of factors. The new algorithm not only provides an effective mechanism to produce summed score to IRT scaled score translation tables properly adjusted for residual dependence, but leads to new applications in test scoring, linking, and model fit checking as well. Simulated and empirical examples are used to illustrate the new applications.
Post-encoding emotional arousal enhances consolidation of item memory, but not reality-monitoring source memory.

Science.gov (United States)

Wang, Bo; Sun, Bukuan

2017-03-01

The current study examined whether the effect of post-encoding emotional arousal on item memory extends to reality-monitoring source memory and, if so, whether the effect depends on emotionality of learning stimuli and testing format. In Experiment 1, participants encoded neutral words and imagined or viewed their corresponding object pictures. Then they watched a neutral, positive, or negative video. The 24-hour delayed test showed that emotional arousal had little effect on both item memory and reality-monitoring source memory. Experiment 2 was similar except that participants encoded neutral, positive, and negative words and imagined or viewed their corresponding object pictures. The results showed that positive and negative emotional arousal induced after encoding enhanced consolidation of item memory, but not reality-monitoring source memory, regardless of emotionality of learning stimuli. Experiment 3, identical to Experiment 2 except that participants were tested only on source memory for all the encoded items, still showed that post-encoding emotional arousal had little effect on consolidation of reality-monitoring source memory. Taken together, regardless of emotionality of learning stimuli and regardless of testing format of source memory (conjunction test vs. independent test), the facilitatory effect of post-encoding emotional arousal on item memory does not generalize to reality-monitoring source memory.
The Dif Identification in Constructed Response Items Using Partial Credit Model

Directory of Open Access Journals (Sweden)

Heri Retnawati

2017-10-01

Full Text Available The study was to identify the load, the type and the significance of differential item functioning (DIF in constructed response item using the partial credit model (PCM. The data in the study were the students’ instruments and the students’ responses toward the PISA-like test items that had been completed by 386 ninth grade students and 460 tenth grade students who had been about 15 years old in the Province of Yogyakarta Special Region in Indonesia. The analysis toward the item characteristics through the student categorization based on their class was conducted toward the PCM using CONQUEST software. Furthermore, by applying these items characteristics, the researcher draw the category response function (CRF graphic in order to identify whether the type of DIF content had been in uniform or non-uniform. The significance of DIF was identified by comparing the discrepancy between the difficulty level parameter and the error in the CONQUEST output results. The results of the analysis showed that from 18 items that had been analyzed there were 4 items which had not been identified load DIF, there were 5 items that had been identified containing DIF but not statistically significant and there were 9 items that had been identified containing DIF significantly. The causes of items containing DIF were discussed.
Applying Item Response Theory methods to design a learning progression-based science assessment

Science.gov (United States)

Chen, Jing

Learning progressions are used to describe how students' understanding of a topic progresses over time and to classify the progress of students into steps or levels. This study applies Item Response Theory (IRT) based methods to investigate how to design learning progression-based science assessments. The research questions of this study are: (1) how to use items in different formats to classify students into levels on the learning progression, (2) how to design a test to give good information about students' progress through the learning progression of a particular construct and (3) what characteristics of test items support their use for assessing students' levels. Data used for this study were collected from 1500 elementary and secondary school students during 2009--2010. The written assessment was developed in several formats such as the Constructed Response (CR) items, Ordered Multiple Choice (OMC) and Multiple True or False (MTF) items. The followings are the main findings from this study. The OMC, MTF and CR items might measure different components of the construct. A single construct explained most of the variance in students' performances. However, additional dimensions in terms of item format can explain certain amount of the variance in student performance. So additional dimensions need to be considered when we want to capture the differences in students' performances on different types of items targeting the understanding of the same underlying progression. Items in each item format need to be improved in certain ways to classify students more accurately into the learning progression levels. This study establishes some general steps that can be followed to design other learning progression-based tests as well. For example, first, the boundaries between levels on the IRT scale can be defined by using the means of the item thresholds across a set of good items. Second, items in multiple formats can be selected to achieve the information criterion at all
Why are the Mathematics National Examination Items Difficult and What Is Teachers’ Strategy to Overcome It?

Directory of Open Access Journals (Sweden)

Heri Retnawati

2017-07-01

Full Text Available The quality of national examination items plays an enormous role in identifying students’ competencies mastery and their difficulties. This study aims to identify the difficult items in the Junior High School Mathematics National Examination, to find the factors that cause students’ difficulty and to reveal the strategies that the teachers and the students might implement in order to overcome them. The study is phenomenological research with the mixed methods. The data were collected using documentation of students’ responses and focus group discussion (FGD of teachers. The data analysis was conducted using Milles & Hubberman steps. The results of the study showed that there were 4 difficult items of the 40 test items for the students. The students’ difficulties were the lack of concept understanding, difficulties in calculating, difficulties in selecting information, being deceived by the distractors, being unaccustomed to completing complex and non-integers test items, and completing contextual test items that have been presented in the form of figures or narrative texts.
Combined Common Person and Common Item Equating of Medical Science Examinations.

Science.gov (United States)

Kelley, Paul R.

This equating study of the National Board of Medical Examiners Examinations was a combined common persons and common items equating, using the Rasch model. The 1,000-item test was administered to about 3,000 second-year medical students in seven equal-length subtests: anatomy, physiology, biochemistry, pathology, microbiology, pharmacology, and…
Comparison of Exposure Controls, Item Pool Characteristics, and Population Distributions for CAT Using the Partial Credit Model

Science.gov (United States)

Lee, HwaYoung; Dodd, Barbara G.

2012-01-01

This study investigated item exposure control procedures under various combinations of item pool characteristics and ability distributions in computerized adaptive testing based on the partial credit model. Three variables were manipulated: item pool characteristics (120 items for each of easy, medium, and hard item pools), two ability…
"Detecting Differential Item Functioning and Differential Step Functioning due to Differences that ""Should"" Matter"

Directory of Open Access Journals (Sweden)

Tess Miller

2010-07-01

Full Text Available This study illustrates the use of differential item functioning (DIF and differential step functioning (DSF analyses to detect differences in item difficulty that are related to experiences of examinees, such as their teachers' instructional practices, that are relevant to the knowledge, skill, or ability the test is intended to measure. This analysis is in contrast to the typical use of DIF or DSF to detect differences related to characteristics of examinees, such as gender, language, or cultural knowledge, that should be irrelevant. Using data from two forms of Ontario's Grade 9 Assessment of Mathematics, analyses were performed comparing groups of students defined by their teachers' instructional practices. All constructed-response items were tested for DIF using the Mantel Chi-Square, standardized Liu Agresti cumulative common log-odds ratio, and standardized Cox's noncentrality parameter. Items exhibiting moderate to large DIF were subsequently tested for DSF. In contrast to typical DIF or DSF analyses, which inform item development, these analyses have the potential to inform instructional practice.
Design of Web Questionnaires : A Test for Number of Items per Screen

NARCIS (Netherlands)

Toepoel, V.; Das, J.W.M.; van Soest, A.H.O.

2005-01-01

This paper presents results from an experimental manipulation of one versus multiple-items per screen format in a Web survey.The purpose of the experiment was to find out if a questionnaire s format influences how respondents provide answers in online questionnaires and if this is depending on
Fitting a Mixture Rasch Model to English as a Foreign Language Listening Tests: The Role of Cognitive and Background Variables in Explaining Latent Differential Item Functioning

Science.gov (United States)

Aryadoust, Vahid

2015-01-01

The present study uses a mixture Rasch model to examine latent differential item functioning in English as a foreign language listening tests. Participants (n = 250) took a listening and lexico-grammatical test and completed the metacognitive awareness listening questionnaire comprising problem solving (PS), planning and evaluation (PE), mental…
A comparison of Rasch item-fit and Cronbach's alpha item reduction analysis for the development of a Quality of Life scale for children and adolescents.

Science.gov (United States)

Erhart, M; Hagquist, C; Auquier, P; Rajmil, L; Power, M; Ravens-Sieberer, U

2010-07-01

This study compares item reduction analysis based on classical test theory (maximizing Cronbach's alpha - approach A), with analysis based on the Rasch Partial Credit Model item-fit (approach B), as applied to children and adolescents' health-related quality of life (HRQoL) items. The reliability and structural, cross-cultural and known-group validity of the measures were examined. Within the European KIDSCREEN project, 3019 children and adolescents (8-18 years) from seven European countries answered 19 HRQoL items of the Physical Well-being dimension of a preliminary KIDSCREEN instrument. The Cronbach's alpha and corrected item total correlation (approach A) were compared with infit mean squares and the Q-index item-fit derived according to a partial credit model (approach B). Cross-cultural differential item functioning (DIF ordinal logistic regression approach), structural validity (confirmatory factor analysis and residual correlation) and relative validity (RV) for socio-demographic and health-related factors were calculated for approaches (A) and (B). Approach (A) led to the retention of 13 items, compared with 11 items with approach (B). The item overlap was 69% for (A) and 78% for (B). The correlation coefficient of the summated ratings was 0.93. The Cronbach's alpha was similar for both versions [0.86 (A); 0.85 (B)]. Both approaches selected some items that are not strictly unidimensional and items displaying DIF. RV ratios favoured (A) with regard to socio-demographic aspects. Approach (B) was superior in RV with regard to health-related aspects. Both types of item reduction analysis should be accompanied by additional analyses. Neither of the two approaches was universally superior with regard to cultural, structural and known-group validity. However, the results support the usability of the Rasch method for developing new HRQoL measures for children and adolescents.
Evaluation of five guidelines for option development in multiple-choice item-writing.

Science.gov (United States)

Martínez, Rafael J; Moreno, Rafael; Martín, Irene; Trigo, M Eva

2009-05-01

This paper evaluates certain guidelines for writing multiple-choice test items. The analysis of the responses of 5013 subjects to 630 items from 21 university classroom achievement tests suggests that an option should not differ in terms of heterogeneous content because such error has a slight but harmful effect on item discrimination. This also occurs with the "None of the above" option when it is the correct one. In contrast, results do not show the supposedly negative effects of a different-length option, the use of specific determiners, or the use of the "All of the above" option, which not only decreases difficulty but also improves discrimination when it is the correct option.
Procurement Engineering Process for Commercial Grade Item Dedication

International Nuclear Information System (INIS)

Park, Jong-Hyuck; Park, Jong-Eun; Kwak, Tack-Hun; Yoo, Keun-Bae; Lee, Sang-Guk; Hong, Sung-Yull

2006-01-01

Procurement Engineering Process for commercial grade item dedication plays an increasingly important role in operation management of Korea Nuclear Power Plants. The purpose of the Procurement Engineering Process is the provision and assurance of a high quality and quantity of spare, replacement, retrofit and new parts and equipment while maximizing plant availability, minimizing downtime due to parts unavailability and providing reasonable overall program and inventory cost. In this paper, we will review the overview requirements, responsibilities and the process for demonstrating with reasonable assurance that a procured item for potential nuclear safety related services or other essential plant service is adequate with reasonable assurance for its application. This paper does not cover the details of technical evaluation, selecting critical characteristics, selecting acceptance methods, performing failure modes and effects analysis, performing source surveillance, performing quality surveys, performing special tests and inspections, and the other aspects of effective Procurement Engineering and Commercial Grade Item Dedication. The main contribution of this paper is to provide the provision of an overview of Procurement Engineering Process for commercial grade item
Extent of awareness and prevalence of adulteration in selected food items in rural Dehradun

Directory of Open Access Journals (Sweden)

Ashok Kumar Srivastava

2016-09-01

Full Text Available Background: Adulteration of food items is common phenomenon in India. It includes both willful adulteration to improve texture and quality of food items and supply of substandard food items. The usual outcomes is outbreak of food borne illness. Aims & Objectives: i To estimate the prevalence of food adulteration in selected food items ii the awareness of subjects regarding food adulteration act and iii their buying practices. Material and Methods: Samplesize:150 households was sampled, based on prevalence of adulteration to be around 50%, with 95% confidence interval and absolute allowable error of 10%. Sample household were drawn from the selected villages randomly. Pre-designed and pretested questionnaires was administered to fulfill the objectives and food items were tested using NICE food adulteration kit. Data were analyzed by numeral with percentage, Pearson’s correlation test and F test. Results: In 59.3% households, housewives purchased the food items for the house. The prevalence of adulteration ranged from 17.3% to 66.2% in selected food items. Loose product was purchased by 54.3%. The food labels on packed items was not read by 86.3%. Mean percentage of purity was highest among literates (57.3 ±12.3 than illiterates and those having primary education. Statistically significant F ratio was seen for mean percentage of purity and respondent’s literacy status. Conclusion: Adulterant is rampant in poor strata of society due to consumer’s illiteracy and lack of awareness towards food safety rules.

不同认知成分在图形推理测验项目难度预测中的作用%The Role of Different Cognitive Components in the Prediction of the Figural Reasoning Test's Item Difficulty

Institute of Scientific and Technical Information of China (English)

李中权; 王力; 张厚粲; 周仁来

2011-01-01

Figural reasoning tests (as represented by Raven's tests) are widely applied as effective measures of fluid intelligence in recruitment and personnel selection. However, several studies have revealed that those tests are not appropriate anymore due to high item exposure rates. Computerized automatic item generation (AIG) has gradually been recognized as a promising technique in handling item exposure. Understanding sources of item variation constitutes the initial stage of Computerized AIG, that is, searching for the underlying processing components and the stimuli that significantly influence those components. Some studies have explored sources of item variation, but so far there are no consistent results. This study investigated the relation between item difficulties and stimuli factors (e.g., familiarity of figures, abstraction of attributes, perceptual organization, and memory load) and determines the relative importance of those factors in predicting item difficulities.Eight sets of figural reasoning tests (each set containing 14 items imitating items from Raven's Advanced Progressive Matrics, APM) were constructed manipulating the familiarity of figures, the degree of abstraction of attributes, the perceptual organization as well as the types and number of rules. Using anchor-test design, these tests were administrated via the internet to 6323 participants with 10 items drawing from APAM as anchor items; thus, each participant completed 14 items from either one set and 10 anchor items within half an hour. In order to prevent participants from using response elimination strategy, we presented one item stem first, then alternatives in turn, and asked participants to determine which alternative was the best.DIMTEST analyses were conducted on the participants' responses on each of eight tests. Results showed that items measure a single dimension on each test. Likelihood ratio test indicated that the data fit two-parameter logistic model (2PL) best. Items were
Development of a subjective cognitive decline questionnaire using item response theory: a pilot study.

Science.gov (United States)

Gifford, Katherine A; Liu, Dandan; Romano, Raymond; Jones, Richard N; Jefferson, Angela L

2015-12-01

Subjective cognitive decline (SCD) may indicate unhealthy cognitive changes, but no standardized SCD measurement exists. This pilot study aims to identify reliable SCD questions. 112 cognitively normal (NC, 76±8 years, 63% female), 43 mild cognitive impairment (MCI; 77±7 years, 51% female), and 33 diagnostically ambiguous participants (79±9 years, 58% female) were recruited from a research registry and completed 57 self-report SCD questions. Psychometric methods were used for item-reduction. Factor analytic models assessed unidimensionality of the latent trait (SCD); 19 items were removed with extreme response distribution or trait-fit. Item response theory (IRT) provided information about question utility; 17 items with low information were dropped. Post-hoc simulation using computerized adaptive test (CAT) modeling selected the most commonly used items (n=9 of 21 items) that represented the latent trait well (r=0.94) and differentiated NC from MCI participants (F(1,146)=8.9, p=0.003). Item response theory and computerized adaptive test modeling identified nine reliable SCD items. This pilot study is a first step toward refining SCD assessment in older adults. Replication of these findings and validation with Alzheimer's disease biomarkers will be an important next step for the creation of a SCD screener.
Determination of a Differential Item Functioning Procedure Using the Hierarchical Generalized Linear Model

Directory of Open Access Journals (Sweden)

Tülin Acar

2012-01-01

Full Text Available The aim of this research is to compare the result of the differential item functioning (DIF determining with hierarchical generalized linear model (HGLM technique and the results of the DIF determining with logistic regression (LR and item response theory–likelihood ratio (IRT-LR techniques on the test items. For this reason, first in this research, it is determined whether the students encounter DIF with HGLM, LR, and IRT-LR techniques according to socioeconomic status (SES, in the Turkish, Social Sciences, and Science subtest items of the Secondary School Institutions Examination. When inspecting the correlations among the techniques in terms of determining the items having DIF, it was discovered that there was significant correlation between the results of IRT-LR and LR techniques in all subtests; merely in Science subtest, the results of the correlation between HGLM and IRT-LR techniques were found significant. DIF applications can be made on test items with other DIF analysis techniques that were not taken to the scope of this research. The analysis results, which were determined by using the DIF techniques in different sample sizes, can be compared.
Calibration of context-specific survey items to assess youth physical activity behaviour.

Science.gov (United States)

Saint-Maurice, Pedro F; Welk, Gregory J; Bartee, R Todd; Heelan, Kate

2017-05-01

This study tests calibration models to re-scale context-specific physical activity (PA) items to accelerometer-derived PA. A total of 195 4th-12th grades children wore an Actigraph monitor and completed the Physical Activity Questionnaire (PAQ) one week later. The relative time spent in moderate-to-vigorous PA (MVPA % ) obtained from the Actigraph at recess, PE, lunch, after-school, evening and weekend was matched with a respective item score obtained from the PAQ's. Item scores from 145 participants were calibrated against objective MVPA % using multiple linear regression with age, and sex as additional predictors. Predicted minutes of MVPA for school, out-of-school and total week were tested in the remaining sample (n = 50) using equivalence testing. The results showed that PAQ β-weights ranged from 0.06 (lunch) to 4.94 (PE) MVPA % (P PAQ and accelerometer MVPA at school and out-of-school ranged from -15.6 to +3.8 min and the PAQ was within 10-15% of accelerometer measured activity. This study demonstrated that context-specific items can be calibrated to predict minutes of MVPA in groups of youth during in- and out-of-school periods.
Testing the Item-Order Account of Design Effects Using the Production Effect

Science.gov (United States)

Jonker, Tanya R.; Levene, Merrick; MacLeod, Colin M.

2014-01-01

A number of memory phenomena evident in recall in within-subject, mixed-lists designs are reduced or eliminated in between-subject, pure-list designs. The item-order account (McDaniel & Bugg, 2008) proposes that differential retention of order information might underlie this pattern. According to this account, order information may be encoded…
Use of differential item functioning analysis to assess the equivalence of translations of a questionnaire

NARCIS (Netherlands)

Petersen, Morten Aa; Groenvold, Mogens; Bjorner, Jakob B.; Aaronson, Neil; Conroy, Thierry; Cull, Ann; Fayers, Peter; Hjermstad, Marianne; Sprangers, Mirjam; Sullivan, Marianne

2003-01-01

In cross-national comparisons based on questionnaires, accurate translations are necessary to obtain valid results. Differential item functioning (DIF) analysis can be used to test whether translations of items in multi-item scales are equivalent to the original. In data from 10,815 respondents
Test report for core drilling ignitability testing

International Nuclear Information System (INIS)

Witwer, K.S.

1996-01-01

Testing was carried out with the cooperation of Westinghouse Hanford Company and the United States Bureau of Mines at the Pittsburgh Research Center in Pennsylvania under the Memorandum of Agreement 14- 09-0050-3666. Several core drilling equipment items, specifically those which can come in contact with flammable gasses while drilling into some waste tanks, were tested under conditions similar to actual field sampling conditions. Rotary drilling against steel and rock as well as drop testing of several different pieces of equipment in a flammable gas environment were the specific items addressed. The test items completed either caused no ignition of the gas mixture, or, after having hardware changes or drilling parameters modified, produced no ignition in repeat testing
Evaluation of the Relative Validity and Test-Retest Reliability of a 15-Item Beverage Intake Questionnaire in Children and Adolescents.

Science.gov (United States)

Hill, Catelyn E; MacDougall, Carly R; Riebl, Shaun K; Savla, Jyoti; Hedrick, Valisa E; Davy, Brenda M

2017-11-01

Added sugar intake, in the form of sugar-sweetened beverages (SSBs), may contribute to weight gain and obesity development in children and adolescents. A valid and reliable brief beverage intake assessment tool for children and adolescents could facilitate research in this area. The purpose of this investigation was to evaluate the relative validity and test-retest reliability of a 15-item beverage intake questionnaire (BEVQ) for assessing usual beverage intake in children and adolescents. This cross-sectional investigation included four study visits within a 2- to 3-week time period. Participants (333 enrolled; 98% completion rate) were children aged 6 to 11 years and adolescents aged 12 to18 years recruited from the New River Valley, VA, region from January 2014 to September 2015. Study visits included assessment of height/weight, health history, and four 24-hour dietary recalls (24HRs). The BEVQ was completed at two visits (BEVQ 1, BEVQ 2). To evaluate relative validity, BEVQ 1 was compared with habitual beverage intake determined by the averaged 24HR. To evaluate test-retest reliability, BEVQ 1 was compared with BEVQ 2. Analyses included descriptive statistics, independent sample t tests, χ 2 tests, one-way analysis of variance, paired sample t tests, and correlational analyses. In the full sample, self-reported water and total SSB intake were not different between BEVQ 1 and 24HR (mean differences 0±1 fl oz and 0±1 fl oz, respectively; both P values >0.05). Reported intake across all beverage categories was significantly correlated between BEVQ 1 and BEVQ 2 (Pbeverages was not different (all P values >0.05) between BEVQ 1 and 24HR (mean differences: whole milk=3±4 kcal, reduced-fat milk=9±5 kcal, and fat-free milk=7±6 kcal, which is 7±15 total beverage kilocalories). In adolescents (n=200), water and SSB kilocalories were not different (both P values >0.05) between BEVQ 1 and 24HR (mean differences: -1±1 fl oz and 12±9 kcal, respectively). A 15
Statistical and extra-statistical considerations in differential item functioning analyses

Directory of Open Access Journals (Sweden)

G. K. Huysamen

2004-10-01

Full Text Available This article briefly describes the main procedures for performing differential item functioning (DIF analyses and points out some of the statistical and extra-statistical implications of these methods. Research findings on the sources of DIF, including those associated with translated tests, are reviewed. As DIF analyses are oblivious of correlations between a test and relevant criteria, the elimination of differentially functioning items does not necessarily improve predictive validity or reduce any predictive bias. The implications of the results of past DIF research for test development in the multilingual and multi-cultural South African society are considered. Opsomming Hierdie artikel beskryf kortliks die hoofprosedures vir die ontleding van differensiële itemfunksionering (DIF en verwys na sommige van die statistiese en buite-statistiese implikasies van hierdie metodes. ’n Oorsig word verskaf van navorsingsbevindings oor die bronne van DIF, insluitend dié by vertaalde toetse. Omdat DIF-ontledings nie die korrelasies tussen ’n toets en relevante kriteria in ag neem nie, sal die verwydering van differensieel-funksionerende items nie noodwendig voorspellingsgeldigheid verbeter of voorspellingsydigheid verminder nie. Die implikasies van vorige DIF-navorsingsbevindings vir toetsontwikkeling in die veeltalige en multikulturele Suid-Afrikaanse gemeenskap word oorweeg.
Geriatric Anxiety Scale: item response theory analysis, differential item functioning, and creation of a ten-item short form (GAS-10).

Science.gov (United States)

Mueller, Anne E; Segal, Daniel L; Gavett, Brandon; Marty, Meghan A; Yochim, Brian; June, Andrea; Coolidge, Frederick L

2015-07-01

The Geriatric Anxiety Scale (GAS; Segal et al. (Segal, D. L., June, A., Payne, M., Coolidge, F. L. and Yochim, B. (2010). Journal of Anxiety Disorders, 24, 709-714. doi:10.1016/j.janxdis.2010.05.002) is a self-report measure of anxiety that was designed to address unique issues associated with anxiety assessment in older adults. This study is the first to use item response theory (IRT) to examine the psychometric properties of a measure of anxiety in older adults. A large sample of older adults (n = 581; mean age = 72.32 years, SD = 7.64 years, range = 60 to 96 years; 64% women; 88% European American) completed the GAS. IRT properties were examined. The presence of differential item functioning (DIF) or measurement bias by age and sex was assessed, and a ten-item short form of the GAS (called the GAS-10) was created. All GAS items had discrimination parameters of 1.07 or greater. Items from the somatic subscale tended to have lower discrimination parameters than items on the cognitive or affective subscales. Two items were flagged for DIF, but the impact of the DIF was negligible. Women scored significantly higher than men on the GAS and its subscales. Participants in the young-old group (60 to 79 years old) scored significantly higher on the cognitive subscale than participants in the old-old group (80 years old and older). Results from the IRT analyses indicated that the GAS and GAS-10 have strong psychometric properties among older adults. We conclude by discussing implications and future research directions.
Asymptotic Standard Errors for Item Response Theory True Score Equating of Polytomous Items

Science.gov (United States)

Cher Wong, Cheow

2015-01-01

Building on previous works by Lord and Ogasawara for dichotomous items, this article proposes an approach to derive the asymptotic standard errors of item response theory true score equating involving polytomous items, for equivalent and nonequivalent groups of examinees. This analytical approach could be used in place of empirical methods like…
Measuring psychological trauma after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Psychological Trauma item bank and short form.

Science.gov (United States)

Kisala, Pamela A; Victorson, David; Pace, Natalie; Heinemann, Allen W; Choi, Seung W; Tulsky, David S

2015-05-01

To describe the development and psychometric properties of the SCI-QOL Psychological Trauma item bank and short form. Using a mixed-methods design, we developed and tested a Psychological Trauma item bank with patient and provider focus groups, cognitive interviews, and item response theory based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. We tested a 31-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Veterans Administration hospital. A total of 716 individuals with SCI completed the trauma items The 31 items fit a unidimensional model (CFI=0.952; RMSEA=0.061) and demonstrated good precision (theta range between 0.6 and 2.5). Nine items demonstrated negligible DIF with little impact on score estimates. The final calibrated item bank contains 19 items The SCI-QOL Psychological Trauma item bank is a psychometrically robust measurement tool from which a short form and a computer adaptive test (CAT) version are available.
Re-Examining Test Item Issues in the TIMSS Mathematics and Science Assessments

Science.gov (United States)

Wang, Jianjun

2011-01-01

As the largest international study ever taken in history, the Trend in Mathematics and Science Study (TIMSS) has been held as a benchmark to measure U.S. student performance in the global context. In-depth analyses of the TIMSS project are conducted in this study to examine key issues of the comparative investigation: (1) item flaws in mathematics…
MIMIC Methods for Assessing Differential Item Functioning in Polytomous Items

Science.gov (United States)

Wang, Wen-Chung; Shih, Ching-Lin

2010-01-01

Three multiple indicators-multiple causes (MIMIC) methods, namely, the standard MIMIC method (M-ST), the MIMIC method with scale purification (M-SP), and the MIMIC method with a pure anchor (M-PA), were developed to assess differential item functioning (DIF) in polytomous items. In a series of simulations, it appeared that all three methods…
Psychometric properties of the Chinese version of resilience scale specific to cancer: an item response theory analysis.

Science.gov (United States)

Ye, Zeng Jie; Liang, Mu Zi; Zhang, Hao Wei; Li, Peng Fei; Ouyang, Xue Ren; Yu, Yuan Liang; Liu, Mei Ling; Qiu, Hong Zhong

2018-06-01

Classic theory test has been used to develop and validate the 25-item Resilience Scale Specific to Cancer (RS-SC) in Chinese patients with cancer. This study was designed to provide additional information about the discriminative value of the individual items tested with an item response theory analysis. A two-parameter graded response model was performed to examine whether any of the items of the RS-SC exhibited problems with the ordering and steps of thresholds, as well as the ability of items to discriminate patients with different resilience levels using item characteristic curves. A sample of 214 Chinese patients with cancer diagnosis was analyzed. The established three-dimension structure of the RS-SC was confirmed. Several items showed problematic thresholds or discrimination ability and require further revision. Some problematic items should be refined and a short-form of RS-SC maybe feasible in clinical settings in order to reduce burden on patients. However, the generalizability of these findings warrants further investigations.
Item Response Data Analysis Using Stata Item Response Theory Package

Science.gov (United States)

Yang, Ji Seung; Zheng, Xiaying

2018-01-01

The purpose of this article is to introduce and review the capability and performance of the Stata item response theory (IRT) package that is available from Stata v.14, 2015. Using a simulated data set and a publicly available item response data set extracted from Programme of International Student Assessment, we review the IRT package from…
The short- and long-term fates of memory items retained outside the focus of attention.

Science.gov (United States)

LaRocque, Joshua J; Eichenbaum, Adam S; Starrett, Michael J; Rose, Nathan S; Emrich, Stephen M; Postle, Bradley R

2015-04-01

When a test of working memory (WM) requires the retention of multiple items, a subset of them can be prioritized. Recent studies have shown that, although prioritized (i.e., attended) items are associated with active neural representations, unprioritized (i.e., unattended) memory items can be retained in WM despite the absence of such active representations, and with no decrement in their recognition if they are cued later in the trial. These findings raise two intriguing questions about the nature of the short-term retention of information outside the focus of attention. First, when the focus of attention shifts from items in WM, is there a loss of fidelity for those unattended memory items? Second, could the retention of unattended memory items be accomplished by long-term memory mechanisms? We addressed the first question by comparing the precision of recall of attended versus unattended memory items, and found a significant decrease in precision for unattended memory items, reflecting a degradation in the quality of those representations. We addressed the second question by asking subjects to perform a WM task, followed by a surprise memory test for the items that they had seen in the WM task. Long-term memory for unattended memory items from the WM task was not better than memory for items that had remained selected by the focus of attention in the WM task. These results show that unattended WM representations are degraded in quality and are not preferentially represented in long-term memory, as compared to attended memory items.
Automated Test-Form Generation

Science.gov (United States)

van der Linden, Wim J.; Diao, Qi

2011-01-01

In automated test assembly (ATA), the methodology of mixed-integer programming is used to select test items from an item bank to meet the specifications for a desired test form and optimize its measurement accuracy. The same methodology can be used to automate the formatting of the set of selected items into the actual test form. Three different…
A note on monotonicity of item response functions for ordered polytomous item response theory models.

Science.gov (United States)

Kang, Hyeon-Ah; Su, Ya-Hui; Chang, Hua-Hua

2018-03-08

A monotone relationship between a true score (τ) and a latent trait level (θ) has been a key assumption for many psychometric applications. The monotonicity property in dichotomous response models is evident as a result of a transformation via a test characteristic curve. Monotonicity in polytomous models, in contrast, is not immediately obvious because item response functions are determined by a set of response category curves, which are conceivably non-monotonic in θ. The purpose of the present note is to demonstrate strict monotonicity in ordered polytomous item response models. Five models that are widely used in operational assessments are considered for proof: the generalized partial credit model (Muraki, 1992, Applied Psychological Measurement, 16, 159), the nominal model (Bock, 1972, Psychometrika, 37, 29), the partial credit model (Masters, 1982, Psychometrika, 47, 147), the rating scale model (Andrich, 1978, Psychometrika, 43, 561), and the graded response model (Samejima, 1972, A general model for free-response data (Psychometric Monograph no. 18). Psychometric Society, Richmond). The study asserts that the item response functions in these models strictly increase in θ and thus there exists strict monotonicity between τ and θ under certain specified conditions. This conclusion validates the practice of customarily using τ in place of θ in applied settings and provides theoretical grounds for one-to-one transformations between the two scales. © 2018 The British Psychological Society.
Single-item memory, associative memory, and the human hippocampus

OpenAIRE

Gold, Jeffrey J.; Hopkins, Ramona O.; Squire, Larry R.

2006-01-01

We tested recognition memory for items and associations in memory-impaired patients with bilateral lesions thought to be limited to the hippocampal region. In Experiment 1 (Combined memory test), participants studied words and then took a memory test in which studied words, new words, studied word pairs, and recombined word pairs were presented in a mixed order. In Experiment 2 (Separated memory test), participants studied single words and then took a memory test involving studied word and ne...

Directed forgetting of complex pictures in an item method paradigm

OpenAIRE

Hauswald, Anne; Kissler, Johanna

2008-01-01

An item-cued directed forgetting paradigm was used to investigate the ability to control episodic memory and selectively encode complex coloured pictures. A series of photographs was presented to 21 participants who were instructed to either remember or forget each picture after it was presented. Memory performance was later tested with a recognition task where all presented items had to be retrieved, regardless of the initial instructions. A directed forgetting effect that is, better recogni...
Investigation of the Performance of Multidimensional Equating Procedures for Common-Item Nonequivalent Groups Design

Directory of Open Access Journals (Sweden)

Burcu ATAR

2017-12-01

Full Text Available In this study, the performance of the multidimensional extentions of Stocking-Lord, mean/mean, and mean/sigma equating procedures under common-item nonequivalent groups design was investigated. The performance of those three equating procedures was examined under the combination of various conditions including sample size, ability distribution, correlation between two dimensions, and percentage of anchor items in the test. Item parameter recovery was evaluated calculating RMSE (root man squared error and BIAS values. It was found that Stocking-Lord procedure provided the smaller RMSE and BIAS values for both item discrimination and item difficulty parameter estimates across most conditions.
The construct equivalence and item bias of the pib/SpEEx conceptualisation-ability test for members of five language groups in South Africa

Directory of Open Access Journals (Sweden)

Pieter Schaap

2008-11-01

Full Text Available This study’s objective was to determine whether the Potential Index Batteries/Situation Specific Evaluation Expert (PIB/SpEEx conceptualisation (100 ability test displays construct equivalence and item bias for members of five selected language groups in South Africa. The sample consisted of a non-probability convenience sample (N = 6 261 of members of five language groups (speakers of Afrikaans, English, North Sotho, Setswana and isiZulu working in the medical and beverage industries or studying at higher-educational institutions. Exploratory factor analysis with target rotations confrmed the PIB/SpEEx 100’s construct equivalence for the respondents from these five language groups. No evidence of either uniform or non-uniform item bias of practical signifcance was found for the sample.
Measuring resilience after spinal cord injury: Development, validation and psychometric characteristics of the SCI-QOL Resilience item bank and short form.

Science.gov (United States)

Victorson, David; Tulsky, David S; Kisala, Pamela A; Kalpakjian, Claire Z; Weiland, Brian; Choi, Seung W

2015-05-01

To describe the development and psychometric properties of the Spinal Cord Injury--Quality of Life (SCI-QOL) Resilience item bank and short form. Using a mixed-methods design, we developed and tested a resilience item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory based analytic approaches, including tests of model fit and differential item functioning (DIF). We tested a 32-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs medical center. A total of 717 individuals with SCI completed the Resilience items. A unidimensional model was observed (CFI=0.968; RMSEA=0.074) and measurement precision was good (theta range between -3.1 and 0.9). Ten items were flagged for DIF, however, after examination of effect sizes we found this to be negligible with little practical impact on score estimates. The final calibrated item bank resulted in 21 retained items. This study indicates that the SCI-QOL Resilience item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available.
Sensitivity of Equated Aggregate Scores to the Treatment of Misbehaving Common Items

Science.gov (United States)

Michaelides, Michalis P.

2010-01-01

The delta-plot method (Angoff, 1972) is a graphical technique used in the context of test equating for identifying common items with aberrant changes in their item difficulties across administrations or alternate forms. This brief research report explores the effects on equated aggregate scores when delta-plot outliers are either retained in or…
Quantifying Local, Response Dependence between Two Polytomous Items Using the Rasch Model

Science.gov (United States)

Andrich, David; Humphry, Stephen M.; Marais, Ida

2012-01-01

Models of modern test theory imply statistical independence among responses, generally referred to as "local independence." One violation of local independence occurs when the response to one item governs the response to a subsequent item. Expanding on a formulation of this kind of violation as a process in the dichotomous Rasch model,…
Item response theory analysis of the Pain Self-Efficacy Questionnaire.

Science.gov (United States)

Costa, Daniel S J; Asghari, Ali; Nicholas, Michael K

2017-01-01

The Pain Self-Efficacy Questionnaire (PSEQ) is a 10-item instrument designed to assess the extent to which a person in pain believes s/he is able to accomplish various activities despite their pain. There is strong evidence for the validity and reliability of both the full-length PSEQ and a 2-item version. The purpose of this study is to further examine the properties of the PSEQ using an item response theory (IRT) approach. We used the two-parameter graded response model to examine the category probability curves, and location and discrimination parameters of the 10 PSEQ items. In item response theory, responses to a set of items are assumed to be probabilistically determined by a latent (unobserved) variable. In the graded-response model specifically, item response threshold (the value of the latent variable for which adjacent response categories are equally likely) and discrimination parameters are estimated for each item. Participants were 1511 mixed, chronic pain patients attending for initial assessment at a tertiary pain management centre. All items except item 7 ('I can cope with my pain without medication') performed well in IRT analysis, and the category probability curves suggested that participants used the 7-point response scale consistently. Items 6 ('I can still do many of the things I enjoy doing, such as hobbies or leisure activity, despite pain'), 8 ('I can still accomplish most of my goals in life, despite the pain') and 9 ('I can live a normal lifestyle, despite the pain') captured higher levels of the latent variable with greater precision. The results from this IRT analysis add to the body of evidence based on classical test theory illustrating the strong psychometric properties of the PSEQ. Despite the relatively poor performance of Item 7, its clinical utility warrants its retention in the questionnaire. The strong psychometric properties of the PSEQ support its use as an effective tool for assessing self-efficacy in people with pain
Does Gender-Specific Differential Item Functioning Affect the Structure in Vocational Interest Inventories?

Science.gov (United States)

Beinicke, Andrea; Pässler, Katja; Hell, Benedikt

2014-01-01

The study investigates consequences of eliminating items showing gender-specific differential item functioning (DIF) on the psychometric structure of a standard RIASEC interest inventory. Holland's hexagonal model was tested for structural invariance using a confirmatory methodological approach (confirmatory factor analysis and randomization…
Item memory, source memory, and the medial temporal lobe: Concordant findings from fMRI and memory-impaired patients

OpenAIRE

Gold, Jeffrey J.; Smith, Christine N.; Bayley, Peter J.; Shrager, Yael; Brewer, James B.; Stark, Craig E. L.; Hopkins, Ramona O.; Squire, Larry R.

2006-01-01

We studied item and source memory with fMRI in healthy volunteers and carried out a parallel study in memory-impaired patients. In experiment 1, volunteers studied a list of words in the scanner and later took an item memory test and a source memory test. Brain activity in the hippocampal region, perirhinal cortex, and parahippocampal cortex was associated with words that would later be remembered (item memory). The activity in these regions that predicted subsequent success at item memory pr...
Item selection and ability estimation adaptive testing

NARCIS (Netherlands)

Pashley, Peter J.; van der Linden, Wim J.; van der Linden, Willem J.; Glas, Cornelis A.W.; Glas, Cees A.W.

2010-01-01

The last century saw a tremendous progression in the refinement and use of standardized linear tests. The first administered College Board exam occurred in 1901 and the first Scholastic Assessment Test (SAT) was given in 1926. Since then, progressively more sophisticated standardized linear tests
Assessing the discriminating power of item and test scores in the linear factor-analysis model

Directory of Open Access Journals (Sweden)

Pere J. Ferrando

2012-01-01

Full Text Available Las propuestas rigurosas y basadas en un modelo psicométrico para estudiar el impreciso concepto de "capacidad discriminativa" son escasas y generalmente limitadas a los modelos no-lineales para items binarios. En este artículo se propone un marco general para evaluar la capacidad discriminativa de las puntuaciones en ítems y tests que son calibrados mediante el modelo de un factor común. La propuesta se organiza en torno a tres criterios: (a tipo de puntuación, (b rango de discriminación y (c aspecto específico que se evalúa. Dentro del marco propuesto: (a se discuten las relaciones entre 16 medidas, de las cuales 6 parecen ser nuevas, y (b se estudian las relaciones entre ellas. La utilidad de la propuesta en las aplicaciones psicométricas que usan el modelo factorial se ilustra mediante un ejemplo empírico.
Response Mixture Modeling: Accounting for Heterogeneity in Item Characteristics across Response Times.

Science.gov (United States)

Molenaar, Dylan; de Boeck, Paul

2018-06-01

In item response theory modeling of responses and response times, it is commonly assumed that the item responses have the same characteristics across the response times. However, heterogeneity might arise in the data if subjects resort to different response processes when solving the test items. These differences may be within-subject effects, that is, a subject might use a certain process on some of the items and a different process with different item characteristics on the other items. If the probability of using one process over the other process depends on the subject's response time, within-subject heterogeneity of the item characteristics across the response times arises. In this paper, the method of response mixture modeling is presented to account for such heterogeneity. Contrary to traditional mixture modeling where the full response vectors are classified, response mixture modeling involves classification of the individual elements in the response vector. In a simulation study, the response mixture model is shown to be viable in terms of parameter recovery. In addition, the response mixture model is applied to a real dataset to illustrate its use in investigating within-subject heterogeneity in the item characteristics across response times.
New decision criteria for selecting delta check methods based on the ratio of the delta difference to the width of the reference range can be generally applicable for each clinical chemistry test item.

Science.gov (United States)

Park, Sang Hyuk; Kim, So-Young; Lee, Woochang; Chun, Sail; Min, Won-Ki

2012-09-01

Many laboratories use 4 delta check methods: delta difference, delta percent change, rate difference, and rate percent change. However, guidelines regarding decision criteria for selecting delta check methods have not yet been provided. We present new decision criteria for selecting delta check methods for each clinical chemistry test item. We collected 811,920 and 669,750 paired (present and previous) test results for 27 clinical chemistry test items from inpatients and outpatients, respectively. We devised new decision criteria for the selection of delta check methods based on the ratio of the delta difference to the width of the reference range (DD/RR). Delta check methods based on these criteria were compared with those based on the CV% of the absolute delta difference (ADD) as well as those reported in 2 previous studies. The delta check methods suggested by new decision criteria based on the DD/RR ratio corresponded well with those based on the CV% of the ADD except for only 2 items each in inpatients and outpatients. Delta check methods based on the DD/RR ratio also corresponded with those suggested in the 2 previous studies, except for 1 and 7 items in inpatients and outpatients, respectively. The DD/RR method appears to yield more feasible and intuitive selection criteria and can easily explain changes in the results by reflecting both the biological variation of the test item and the clinical characteristics of patients in each laboratory. We suggest this as a measure to determine delta check methods.
Gender Differences in Figural Matrices: The Moderating Role of Item Design Features

Science.gov (United States)

Arendasy, Martin E.; Sommer, Markus

2012-01-01

There is a heated debate on whether observed gender differences in some figural matrices in adults can be attributed to gender differences in inductive reasoning/G[subscript f] or differential item functioning and/or test bias. Based on previous studies we hypothesized that three specific item design features moderate the effect size of the gender…
The leading hand in bimanual activities - A search for more valid handedness items.

Science.gov (United States)

Olsson, Bo; Kirchengast, Sylvia

2016-11-01

The aim of this pilot study is to test a new approach to handedness assessment based on the concept of the leading hand. A well-established graphomotor performance test of handedness (H-D-T) and a new test according on the concept of the leading hand were undertaken by 41 Viennese schoolchildren between 6 and 8 years of age. The new test is based on in vivo observations of bimanual activities. In detail the test battery consisted of 8 fine motor leading hand items. Participants had to open and close four small objects (one tube, three small bottles) in order to observe twisting movements and four small objects (2 matchboxes, 2 small brushes) in order to observe back-and-forth movements. It turned out that the leading hand does not correlate with the hand dominance in a graphomotor test to the degree that the handedness in unimanual items has been found to do and that right leading hand scores in bimanual items are encountered significantly less often than right hand scores in a graphomotor test. The findings of the present study suggest that tests of the leading hand in vivo may contribute to a higher validity of the assessment of handedness in examinations of the lateralization of higher cortical functions.
Enhancing the Equating of Item Difficulty Metrics: Estimation of Reference Distribution. Research Report. ETS RR-14-07

Science.gov (United States)

Ali, Usama S.; Walker, Michael E.

2014-01-01

Two methods are currently in use at Educational Testing Service (ETS) for equating observed item difficulty statistics. The first method involves the linear equating of item statistics in an observed sample to reference statistics on the same items. The second method, or the item response curve (IRC) method, involves the summation of conditional…
Detecting Test Tampering Using Item Response Theory

Science.gov (United States)

Wollack, James A.; Cohen, Allan S.; Eckerly, Carol A.

2015-01-01

Test tampering, especially on tests for educational accountability, is an unfortunate reality, necessitating that the state (or its testing vendor) perform data forensic analyses, such as erasure analyses, to look for signs of possible malfeasance. Few statistical approaches exist for detecting fraudulent erasures, and those that do largely do not…
[Instrument to measure adherence in hypertensive patients: contribution of Item Response Theory].

Science.gov (United States)

Rodrigues, Malvina Thaís Pacheco; Moreira, Thereza Maria Magalhaes; Vasconcelos, Alexandre Meira de; Andrade, Dalton Francisco de; Silva, Daniele Braz da; Barbetta, Pedro Alberto

2013-06-01

To analyze, by means of "Item Response Theory", an instrument to measure adherence to t treatment for hypertension. Analytical study with 406 hypertensive patients with associated complications seen in primary care in Fortaleza, CE, Northeastern Brazil, 2011 using "Item Response Theory". The stages were: dimensionality test, calibrating the items, processing data and creating a scale, analyzed using the gradual response model. A study of the dimensionality of the instrument was conducted by analyzing the polychoric correlation matrix and factor analysis of complete information. Multilog software was used to calibrate items and estimate the scores. Items relating to drug therapy are the most directly related to adherence while those relating to drug-free therapy need to be reworked because they have less psychometric information and low discrimination. The independence of items, the small number of levels in the scale and low explained variance in the adjustment of the models show the main weaknesses of the instrument analyzed. The "Item Response Theory" proved to be a relevant analysis technique because it evaluated respondents for adherence to treatment for hypertension, the level of difficulty of the items and their ability to discriminate between individuals with different levels of adherence, which generates a greater amount of information. The instrument analyzed is limited in measuring adherence to hypertension treatment, by analyzing the "Item Response Theory" of the item, and needs adjustment. The proper formulation of the items is important in order to accurately measure the desired latent trait.
Effects of Misbehaving Common Items on Aggregate Scores and an Application of the Mantel-Haenszel Statistic in Test Equating. CSE Report 688

Science.gov (United States)

Michaelides, Michalis P.

2006-01-01

Consistent behavior is a desirable characteristic that common items are expected to have when administered to different groups. Findings from the literature have established that items do not always behave in consistent ways; item indices and IRT item parameter estimates of the same items differ when obtained from different administrations.…
Item response analysis on an examination in anesthesiology for medical students in Taiwan: A comparison of one- and two-parameter logistic models

Directory of Open Access Journals (Sweden)

Yu-Feng Huang

2013-06-01

Conclusion: Item response models are useful for medical test analyses and provide valuable information about model comparisons and identification of differential items other than test reliability, item difficulty, and examinee's ability.

Automatic Item Generation via Frame Semantics: Natural Language Generation of Math Word Problems.

Science.gov (United States)

Deane, Paul; Sheehan, Kathleen

This paper is an exploration of the conceptual issues that have arisen in the course of building a natural language generation (NLG) system for automatic test item generation. While natural language processing techniques are applicable to general verbal items, mathematics word problems are particularly tractable targets for natural language…
Estimating reliability coefficients with heterogeneous item weightings using Stata: A factor based approach

NARCIS (Netherlands)

Boermans, M.A.; Kattenberg, M.A.C.

2011-01-01

We show how to estimate a Cronbach's alpha reliability coefficient in Stata after running a principal component or factor analysis. Alpha evaluates to what extent items measure the same underlying content when the items are combined into a scale or used for latent variable. Stata allows for testing
The Long-Term Conditions Questionnaire: conceptual framework and item development.

Science.gov (United States)

Peters, Michele; Potter, Caroline M; Kelly, Laura; Hunter, Cheryl; Gibbons, Elizabeth; Jenkinson, Crispin; Coulter, Angela; Forder, Julien; Towers, Ann-Marie; A'Court, Christine; Fitzpatrick, Ray

2016-01-01

To identify the main issues of importance when living with long-term conditions to refine a conceptual framework for informing the item development of a patient-reported outcome measure for long-term conditions. Semi-structured qualitative interviews (n=48) were conducted with people living with at least one long-term condition. Participants were recruited through primary care. The interviews were transcribed verbatim and analyzed by thematic analysis. The analysis served to refine the conceptual framework, based on reviews of the literature and stakeholder consultations, for developing candidate items for a new measure for long-term conditions. Three main organizing concepts were identified: impact of long-term conditions, experience of services and support, and self-care. The findings helped to refine a conceptual framework, leading to the development of 23 items that represent issues of importance in long-term conditions. The 23 candidate items formed the first draft of the measure, currently named the Long-Term Conditions Questionnaire. The aim of this study was to refine the conceptual framework and develop items for a patient-reported outcome measure for long-term conditions, including single and multiple morbidities and physical and mental health conditions. Qualitative interviews identified the key themes for assessing outcomes in long-term conditions, and these underpinned the development of the initial draft of the measure. These initial items will undergo cognitive testing to refine the items prior to further validation in a survey.
Work ability as prognostic risk marker of disability pension: single-item work ability score versus multi-item work ability index.

Science.gov (United States)

Roelen, Corné A M; van Rhenen, Willem; Groothoff, Johan W; van der Klink, Jac J L; Twisk, Jos W R; Heymans, Martijn W

2014-07-01

Work ability predicts future disability pension (DP). A single-item work ability score (WAS) is emerging as a measure for work ability. This study compared single-item WAS with the multi-item work ability index (WAI) in its ability to identify workers at risk of DP. This prospective cohort study comprised 11 537 male construction workers, who completed the WAI at baseline and reported DP after a mean 2.3 years of follow-up. WAS and WAI were calibrated for DP risk predictions with the Hosmer-Lemeshow (H-L) test and their ability to discriminate between high- and low-risk construction workers was investigated with the area under the receiver operating characteristic curve (AUC). At follow-up, 336 (3%) construction workers reported DP. Both WAS [odds ratio (OR) 0.72, 95% confidence interval (95% CI) 0.66-0.78] and WAI (OR 0.57, 95% CI 0.52-0.63) scores were associated with DP at follow-up. The WAS showed miscalibration (H-L model χ (�)=10.60; df=3; P=0.01) and poorly discriminated between high- and low-risk construction workers (AUC 0.67, 95% CI 0.64-0.70). In contrast, calibration (H-L model χ �=8.20; df=8; P=0.41) and discrimination (AUC 0.78, 95% CI 0.75-0.80) were both adequate for the WAI. Although associated with the risk of future DP, the single-item WAS poorly identified male construction workers at risk of DP. We recommend using the multi-item WAI to screen for risk of DP in occupational health practice.
Test data on electrical contacts at high surface velocities and high current densities for homopolar generators

International Nuclear Information System (INIS)

Brennan, M.; Tolk, K.M.; Weldon, W.F.; Rylander, H.G.; Woodson, H.H.

1977-01-01

Test data is presented for one grade of copper graphite brush material, Morganite CMlS, over a wide range of surface velocities, atmospheres, and current densities that are expected for fast discharge (<100 ms) homopolar generators. The brushes were run on a copper coated 7075-T6 aluminum disk at surface speeds up to 277 m/sec. One electroplated copper and three flame sprayed copper coatings were used during the tests. Significant differences in contact voltage drops and surface mechanical properties of the copper coatings were observed
Item response theory analyses of the Delis-Kaplan Executive Function System card sorting subtest.

Science.gov (United States)

Spencer, Mercedes; Cho, Sun-Joo; Cutting, Laurie E

2018-02-02

In the current study, we examined the dimensionality of the 16-item Card Sorting subtest of the Delis-Kaplan Executive Functioning System assessment in a sample of 264 native English-speaking children between the ages of 9 and 15 years. We also tested for measurement invariance for these items across age and gender groups using item response theory (IRT). Results of the exploratory factor analysis indicated that a two-factor model that distinguished between verbal and perceptual items provided the best fit to the data. Although the items demonstrated measurement invariance across age groups, measurement invariance was violated for gender groups, with two items demonstrating differential item functioning for males and females. Multigroup analysis using all 16 items indicated that the items were more effective for individuals whose IRT scale scores were relatively high. A single-group explanatory IRT model using 14 non-differential item functioning items showed that for perceptual ability, females scored higher than males and that scores increased with age for both males and females; for verbal ability, the observed increase in scores across age differed for males and females. The implications of these findings are discussed.
Automatic item generation implemented for measuring artistic judgment aptitude.

Science.gov (United States)

Bezruczko, Nikolaus

2014-01-01

Automatic item generation (AIG) is a broad class of methods that are being developed to address psychometric issues arising from internet and computer-based testing. In general, issues emphasize efficiency, validity, and diagnostic usefulness of large scale mental testing. Rapid prominence of AIG methods and their implicit perspective on mental testing is bringing painful scrutiny to many sacred psychometric assumptions. This report reviews basic AIG ideas, then presents conceptual foundations, image model development, and operational application to artistic judgment aptitude testing.
Laboratory tools for diagnosis and monitoring response in patients with chronic myeloid leukemia.

Science.gov (United States)

Tohami, Tali; Nagler, Arnon; Amariglio, Ninette

2012-08-01

Chronic myeloid leukemia (CML) is a clonal hematological disease that represents 15-20% of all adult leukemia cases. The study and treatment of CML has contributed pivotal advances to translational medicine and cancer therapy. The discovery that a single chromosomal abnormality, the Philadelphia (Ph) chromosome, is responsible for the etiology of this disease was a milestone for treating and understanding CML. Subsequently, CML became the first disease for which allogeneic bone marrow transplantation is the treatment of choice. Currently, CML is one of the few diseases where treatment targeted against the chromosomal abnormality is the sole frontline therapy for newly diagnosed patients. The use of directed therapy for CML challenged disease monitoring during treatment and led to the development of definitions that document response and predict relapse sooner than the former routine methods. These methods relied on classical cytogenetics through molecular cytogenetics (FISH) and, finally, on molecular monitoring assays. This review discusses the laboratory tools used for diagnosing CML, for monitoring during treatment, and for assessing remission or relapse. The advantages and disadvantages of each test, the common definition of response levels, and the efforts to standardize molecular monitoring for CML patient management are discussed.
Exploring differential item functioning (DIF) with the Rasch model: a comparison of gender differences on eighth grade science items in the United States and Spain.

Science.gov (United States)

Babiar, Tasha Calvert

2011-01-01

Traditionally, women and minorities have not been fully represented in science and engineering. Numerous studies have attributed these differences to gaps in science achievement as measured by various standardized tests. Rather than describe mean group differences in science achievement across multiple cultures, this study focused on an in-depth item-level analysis across two countries: Spain and the United States. This study investigated eighth-grade gender differences on science items across the two countries. A secondary purpose of the study was to explore the nature of gender differences using the many-faceted Rasch Model as a way to estimate gender DIF. A secondary analysis of data from the Third International Mathematics and Science Study (TIMSS) was used to address three questions: 1) Does gender DIF in science achievement exist? 2) Is there a relationship between gender DIF and characteristics of the science items? 3) Do the relationships between item characteristics and gender DIF in science items replicate across countries. Participants included 7,087 eight grade students from the United States and 3,855 students from Spain who participated in TIMSS. The Facets program (Linacre and Wright, 1992) was used to estimate gender DIF. The results of the analysis indicate that the content of the item seemed to be related to gender DIF. The analysis also suggests that there is a relationship between gender DIF and item format. No pattern of gender DIF related to cognitive demand was found. The general pattern of gender DIF was similar across the two countries used in the analysis. The strength of item-level analysis as opposed to group mean difference analysis is that gender differences can be detected at the item level, even when no mean differences can be detected at the group level.
Generalizability theory and item response theory

NARCIS (Netherlands)

Glas, Cornelis A.W.; Eggen, T.J.H.M.; Veldkamp, B.P.

2012-01-01

Item response theory is usually applied to items with a selected-response format, such as multiple choice items, whereas generalizability theory is usually applied to constructed-response tasks assessed by raters. However, in many situations, raters may use rating scales consisting of items with a
Sharing the cost of redundant items

DEFF Research Database (Denmark)

Hougaard, Jens Leth; Moulin, Hervé

2014-01-01

We ask how to share the cost of finitely many public goods (items) among users with different needs: some smaller subsets of items are enough to serve the needs of each user, yet the cost of all items must be covered, even if this entails inefficiently paying for redundant items. Typical examples...... are network connectivity problems when an existing (possibly inefficient) network must be maintained. We axiomatize a family cost ratios based on simple liability indices, one for each agent and for each item, measuring the relative worth of this item across agents, and generating cost allocation rules...... additive in costs....
Applying automatic item generation to create cohesive physics testlets

Science.gov (United States)

Mindyarto, B. N.; Nugroho, S. E.; Linuwih, S.

2018-03-01

Computer-based testing has created the demand for large numbers of items. This paper discusses the production of cohesive physics testlets using an automatic item generation concepts and procedures. The testlets were composed by restructuring physics problems to reveal deeper understanding of the underlying physical concepts by inserting a qualitative question and its scientific reasoning question. A template-based testlet generator was used to generate the testlet variants. Using this methodology, 1248 testlet variants were effectively generated from 25 testlet templates. Some issues related to the effective application of the generated physics testlets in practical assessments were discussed.
MMPI-2 Item Endorsements in Dissociative Identity Disorder vs. Simulators.

Science.gov (United States)

Brand, Bethany L; Chasson, Gregory S; Palermo, Cori A; Donato, Frank M; Rhodes, Kyle P; Voorhees, Emily F

2016-03-01

Elevated scores on some MMPI-2 (Minnesota Multiphasic Inventory-2) validity scales are common among patients with dissociative identity disorder (DID), which raises questions about the validity of their responses. Such patients show elevated scores on atypical answers (F), F-psychopathology (Fp), atypical answers in the second half of the test (FB), schizophrenia (Sc), and depression (D) scales, with Fp showing the greatest utility in distinguishing them from coached and uncoached DID simulators. In the current study, we investigated the items on the MMPI-2 F, Fp, FB, Sc, and D scales that were most and least commonly endorsed by participants with DID in our 2014 study and compared these responses with those of coached and uncoached DID simulators. The comparisons revealed that patients with DID most frequently endorsed items related to dissociation, trauma, depression, fearfulness, conflict within family, and self-destructiveness. The coached group more successfully imitated item endorsements of the DID group than did the uncoached group. However, both simulating groups, especially the uncoached group, frequently endorsed items that were uncommonly endorsed by the DID group. The uncoached group endorsed items consistent with popular media portrayals of people with DID being violent, delusional, and unlawful. These results suggest that item endorsement patterns can provide useful information to clinicians making determinations about whether an individual is presenting with DID or feigning. © 2016 American Academy of Psychiatry and the Law.
International Assessment: A Rasch Model and Teachers' Evaluation of TIMSS Science Achievement Items

Science.gov (United States)

Glynn, Shawn M.

2012-01-01

The Trends in International Mathematics and Science Study (TIMSS) is a comparative assessment of the achievement of students in many countries. In the present study, a rigorous independent evaluation was conducted of a representative sample of TIMSS science test items because item quality influences the validity of the scores used to inform…
Dissociating the neural correlates of intra-item and inter-item working-memory binding.

Directory of Open Access Journals (Sweden)

Carinne Piekema

Full Text Available BACKGROUND: Integration of information streams into a unitary representation is an important task of our cognitive system. Within working memory, the medial temporal lobe (MTL has been conceptually linked to the maintenance of bound representations. In a previous fMRI study, we have shown that the MTL is indeed more active during working-memory maintenance of spatial associations as compared to non-spatial associations or single items. There are two explanations for this result, the mere presence of the spatial component activates the MTL, or the MTL is recruited to bind associations between neurally non-overlapping representations. METHODOLOGY/PRINCIPAL FINDINGS: The current fMRI study investigates this issue further by directly comparing intrinsic intra-item binding (object/colour, extrinsic intra-item binding (object/location, and inter-item binding (object/object. The three binding conditions resulted in differential activation of brain regions. Specifically, we show that the MTL is important for establishing extrinsic intra-item associations and inter-item associations, in line with the notion that binding of information processed in different brain regions depends on the MTL. CONCLUSIONS/SIGNIFICANCE: Our findings indicate that different forms of working-memory binding rely on specific neural structures. In addition, these results extend previous reports indicating that the MTL is implicated in working-memory maintenance, challenging the classic distinction between short-term and long-term memory systems.
Generalizability theory and item response theory

OpenAIRE

Glas, Cornelis A.W.; Eggen, T.J.H.M.; Veldkamp, B.P.

2012-01-01

Item response theory is usually applied to items with a selected-response format, such as multiple choice items, whereas generalizability theory is usually applied to constructed-response tasks assessed by raters. However, in many situations, raters may use rating scales consisting of items with a selected-response format. This chapter presents a short overview of how item response theory and generalizability theory were integrated to model such assessments. Further, the precision of the esti...
22 CFR 121.8 - End-items, components, accessories, attachments, parts, firmware, software and systems.

Science.gov (United States)

2010-04-01

... an assembled article ready for its intended use. Only ammunition, fuel or another energy source is...-item without which the end-item is inoperable. (Example: Airframes, tail sections, transmissions, tank..., operating systems and support software for design, implementation, test, operation, diagnosis and repair. A...
Testing the robustness of deterministic models of optimal dynamic pricing and lot-sizing for deteriorating items under stochastic conditions

DEFF Research Database (Denmark)

Ghoreishi, Maryam

2018-01-01

Many models within the field of optimal dynamic pricing and lot-sizing models for deteriorating items assume everything is deterministic and develop a differential equation as the core of analysis. Two prominent examples are the papers by Rajan et al. (Manag Sci 38:240–262, 1992) and Abad (Manag......, we will try to expose the model by Abad (1996) and Rajan et al. (1992) to stochastic inputs; however, designing these stochastic inputs such that they as closely as possible are aligned with the assumptions of those papers. We do our investigation through a numerical test where we test the robustness...... of the numerical results reported in Rajan et al. (1992) and Abad (1996) in a simulation model. Our numerical results seem to confirm that the results stated in these papers are indeed robust when being imposed to stochastic inputs....
Isolation and killing of candidate chronic myeloid leukemia stem cells by antibody targeting of IL-1 receptor accessory protein

DEFF Research Database (Denmark)

Järås, Marcus; Johnels, Petra; Hansen, Nils Gunder

2010-01-01

Chronic myeloid leukemia (CML) is genetically characterized by the Philadelphia (Ph) chromosome, formed through a reciprocal translocation between chromosomes 9 and 22 and giving rise to the constitutively active tyrosine kinase P210 BCR/ABL1. Therapeutic strategies aiming for a cure of CML...... will require full eradication of Ph chromosome-positive (Ph(+)) CML stem cells. Here we used gene-expression profiling to identify IL-1 receptor accessory protein (IL1RAP) as up-regulated in CML CD34(+) cells and also in cord blood CD34(+) cells as a consequence of retroviral BCR/ABL1 expression. To test...
An Item Bank for Abuse of Prescription Pain Medication from the Patient-Reported Outcomes Measurement Information System (PROMIS®).

Science.gov (United States)

Pilkonis, Paul A; Yu, Lan; Dodds, Nathan E; Johnston, Kelly L; Lawrence, Suzanne M; Hilton, Thomas F; Daley, Dennis C; Patkar, Ashwin A; McCarty, Dennis

2017-08-01

There is a need to monitor patients receiving prescription opioids to detect possible signs of abuse. To address this need, we developed and calibrated an item bank for severity of abuse of prescription pain medication as part of the Patient-Reported Outcomes Measurement Information System (PROMIS ® ). Comprehensive literature searches yielded an initial bank of 5,310 items relevant to substance use and abuse, including abuse of prescription pain medication, from over 80 unique instruments. After qualitative item analysis (i.e., focus groups, cognitive interviewing, expert review, and item revision), 25 items for abuse of prescribed pain medication were included in field testing. Items were written in a first-person, past-tense format, with a three-month time frame and five response options reflecting frequency or severity. The calibration sample included 448 respondents, 367 from the general population (ascertained through an internet panel) and 81 from community treatment programs participating in the National Drug Abuse Treatment Clinical Trials Network. A final bank of 22 items was calibrated using the two-parameter graded response model from item response theory. A seven-item static short form was also developed. The test information curve showed that the PROMIS ® item bank for abuse of prescription pain medication provided substantial information in a broad range of severity. The initial psychometric characteristics of the item bank support its use as a computerized adaptive test or short form, with either version providing a brief, precise, and efficient measure relevant to both clinical and community samples. © 2016 American Academy of Pain Medicine. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com

The Effect of Mini and Midi Anchor Tests on Test Equating

Science.gov (United States)

Arikan, Çigdem Akin

2018-01-01

The main purpose of this study is to compare the test forms to the midi anchor test and the mini anchor test performance based on item response theory. The research was conducted with using simulated data which were generated based on Rasch model. In order to equate two test forms the anchor item nonequivalent groups (internal anchor test) was…
The randomly renewed general item and the randomly inspected item with exponential life distribution

International Nuclear Information System (INIS)

Schneeweiss, W.G.

1979-01-01

For a randomly renewed item the probability distributions of the time to failure and of the duration of down time and the expectations of these random variables are determined. Moreover, it is shown that the same theory applies to randomly checked items with exponential probability distribution of life such as electronic items. The case of periodic renewals is treated as an example. (orig.) [de
Quantitative penetration testing with item response theory

NARCIS (Netherlands)

Pieters, W.; Arnold, F.; Stoelinga, M.I.A.

2013-01-01

Existing penetration testing approaches assess the vulnerability of a system by determining whether certain attack paths are possible in practice. Therefore, penetration testing has thus far been used as a qualitative research method. To enable quantitative approaches to security risk management,
Measuring self-esteem after spinal cord injury: Development, validation and psychometric characteristics of the SCI-QOL Self-esteem item bank and short form.

Science.gov (United States)

Kalpakjian, Claire Z; Tate, Denise G; Kisala, Pamela A; Tulsky, David S

2015-05-01

To describe the development and psychometric properties of the Spinal Cord Injury-Quality of Life (SCI-QOL) Self-esteem item bank. Using a mixed-methods design, we developed and tested a self-esteem item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory-(IRT) based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. We tested a pool of 30 items at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters/Bronx Department of Veterans Affairs hospital. A total of 717 individuals with SCI completed the self-esteem items. A unidimensional model was observed (CFI=0.946; RMSEA=0.087) and measurement precision was good (theta range between -2.7 and 0.7). Eleven items were flagged for DIF; however, effect sizes were negligible with little practical impact on score estimates. The final calibrated item bank resulted in 23 retained items. This study indicates that the SCI-QOL Self-esteem item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available.
Quantitative Penetration Testing with Item Response Theory

NARCIS (Netherlands)

Arnold, Florian; Pieters, Wolter; Stoelinga, Mariëlle Ida Antoinette

2014-01-01

Existing penetration testing approaches assess the vulnerability of a system by determining whether certain attack paths are possible in practice. Thus, penetration testing has so far been used as a qualitative research method. To enable quantitative approaches to security risk management, including
Quantitative penetration testing with item response theory

NARCIS (Netherlands)

Arnold, Florian; Pieters, Wolter; Stoelinga, Mariëlle

2013-01-01

Existing penetration testing approaches assess the vulnerability of a system by determining whether certain attack paths are possible in practice. Thus, penetration testing has so far been used as a qualitative research method. To enable quantitative approaches to security risk management, including
Cognitive interviewing methodology in the development of a pediatric item bank: a patient reported outcomes measurement information system (PROMIS study

Directory of Open Access Journals (Sweden)

DeWalt Darren A

2009-01-01

Full Text Available Abstract Background The evaluation of patient-reported outcomes (PROs in health care has seen greater use in recent years, and methods to improve the reliability and validity of PRO instruments are advancing. This paper discusses the cognitive interviewing procedures employed by the Patient Reported Outcomes Measurement Information System (PROMIS pediatrics group for the purpose of developing a dynamic, electronic item bank for field testing with children and adolescents using novel computer technology. The primary objective of this study was to conduct cognitive interviews with children and adolescents to gain feedback on items measuring physical functioning, emotional health, social health, fatigue, pain, and asthma-specific symptoms. Methods A total of 88 cognitive interviews were conducted with 77 children and adolescents across two sites on 318 items. From this initial item bank, 25 items were deleted and 35 were revised and underwent a second round of cognitive interviews. A total of 293 items were retained for field testing. Results Children as young as 8 years of age were able to comprehend the majority of items, response options, directions, recall period, and identify problems with language that was difficult for them to understand. Cognitive interviews indicated issues with item comprehension on several items which led to alternative wording for these items. Conclusion Children ages 8–17 years were able to comprehend most item stems and response options in the present study. Field testing with the resulting items and response options is presently being conducted as part of the PROMIS Pediatric Item Bank development process.
Identification and Development of Items Comprising Organizational Citizenship Behaviors Among Pharmacy Faculty.

Science.gov (United States)

Desselle, Shane P; Semsick, Gretchen R

2016-12-25

Objective. Identify behaviors that can compose a measure of organizational citizenship by pharmacy faculty. Methods. A four-round, modified Delphi procedure using open-ended questions (Round 1) was conducted with 13 panelists from pharmacy academia. The items generated were evaluated and refined for inclusion in subsequent rounds. A consensus was reached after completing four rounds. Results. The panel produced a set of 26 items indicative of extra-role behaviors by faculty colleagues considered to compose a measure of citizenship, which is an expressed manifestation of collegiality. Conclusions. The items generated require testing for validation and reliability in a large sample to create a measure of organizational citizenship. Even prior to doing so, the list of items can serve as a resource for mentorship of junior and senior faculty alike.
Identification and Development of Items Comprising Organizational Citizenship Behaviors Among Pharmacy Faculty

Science.gov (United States)

Semsick, Gretchen R.

2016-01-01

Objective. Identify behaviors that can compose a measure of organizational citizenship by pharmacy faculty. Methods. A four-round, modified Delphi procedure using open-ended questions (Round 1) was conducted with 13 panelists from pharmacy academia. The items generated were evaluated and refined for inclusion in subsequent rounds. A consensus was reached after completing four rounds. Results. The panel produced a set of 26 items indicative of extra-role behaviors by faculty colleagues considered to compose a measure of citizenship, which is an expressed manifestation of collegiality. Conclusions. The items generated require testing for validation and reliability in a large sample to create a measure of organizational citizenship. Even prior to doing so, the list of items can serve as a resource for mentorship of junior and senior faculty alike. PMID:28179717
A Generalized Logistic Regression Procedure to Detect Differential Item Functioning among Multiple Groups

Science.gov (United States)

Magis, David; Raiche, Gilles; Beland, Sebastien; Gerard, Paul

2011-01-01

We present an extension of the logistic regression procedure to identify dichotomous differential item functioning (DIF) in the presence of more than two groups of respondents. Starting from the usual framework of a single focal group, we propose a general approach to estimate the item response functions in each group and to test for the presence…
Modeling Answer Change Behavior: An Application of a Generalized Item Response Tree Model

Science.gov (United States)

Jeon, Minjeong; De Boeck, Paul; van der Linden, Wim

2017-01-01

We present a novel application of a generalized item response tree model to investigate test takers' answer change behavior. The model allows us to simultaneously model the observed patterns of the initial and final responses after an answer change as a function of a set of latent traits and item parameters. The proposed application is illustrated…
Análise de itens de uma prova de raciocínio estatístico Analysis of items of a statistical reasoning test

Directory of Open Access Journals (Sweden)

Claudette Maria Medeiros Vendramini

2004-12-01

Full Text Available Este estudo objetivou analisar as 18 questões (do tipo múltipla escolha de uma prova sobre conceitos básicos de Estatística pelas teorias clássica e moderna. Participaram 325 universitários, selecionados aleatoriamente das áreas de humanas, exatas e saúde. A análise indicou que a prova é predominantemente unidimensional e que os itens podem ser mais bem ajustados ao modelo de três parâmetros. Os índices de dificuldade, discriminação e correlação bisserial apresentam valores aceitáveis. Sugere-se a inclusão de novos itens na prova, que busquem confiabilidade e validade para o contexto educacional e revelem o raciocínio estatístico de universitários ao ler representações de dados estatísticos.This study aimed at to analyze the 18 questions (of multiple choice type of a test on basic concepts of Statistics for the classic and modern theories. The test was taken by 325 undergraduate students, randomly selected from the areas of Human, Exact and Health Sciences. The analysis indicated that the test has predominantly one dimension and that the items can be better fitting to the model of three parameters. The indexes of difficulty, discrimination and biserial correlation present acceptable values. It is suggested to include new items to the test in order to obtain reliability and validity to use it in the education context and to reveal the statistical reasoning of undergraduate students when dealing with statistical data representation.
Using Reversed MFCC and IT-EM for Automatic Speaker Verification

Directory of Open Access Journals (Sweden)

Sheeraz Memon

2012-01-01

Full Text Available This paper proposes text independent automatic speaker verification system using IMFCC (Inverse/ Reverse Mel Frequency Coefficients and IT-EM (Information Theoretic Expectation Maximization. To perform speaker verification, feature extraction using Mel scale has been widely applied and has established better results. The IMFCC is based on inverse Mel-scale. The IMFCC effectively captures information available at the high frequency formants which is ignored by the MFCC. In this paper the fusion of MFCC and IMFCC at input level is proposed. GMMs (Gaussian Mixture Models based on EM (Expectation Maximization have been widely used for classification of text independent verification. However EM comes across the convergence issue. In this paper we use our proposed IT-EM which has faster convergence, to train speaker models. IT-EM uses information theory principles such as PDE (Parzen Density Estimation and KL (Kullback-Leibler divergence measure. IT-EM acclimatizes the weights, means and covariances, like EM. However, IT-EM process is not performed on feature vector sets but on a set of centroids obtained using IT (Information Theoretic metric. The IT-EM process at once diminishes divergence measure between PDE estimates of features distribution within a given class and the centroids distribution within the same class. The feature level fusion and IT-EM is tested for the task of speaker verification using NIST2001 and NIST2004. The experimental evaluation validates that MFCC/IMFCC has better results than the conventional delta/MFCC feature set. The MFCC/IMFCC feature vector size is also much smaller than the delta MFCC thus reducing the computational burden as well. IT-EM method also showed faster convergence, than the conventional EM method, and thus it leads to higher speaker recognition scores.
17 CFR 260.7a-16 - Inclusion of items, differentiation between items and answers, omission of instructions.

Science.gov (United States)

2010-04-01

... 17 Commodity and Securities Exchanges 3 2010-04-01 2010-04-01 false Inclusion of items, differentiation between items and answers, omission of instructions. 260.7a-16 Section 260.7a-16 Commodity and... INDENTURE ACT OF 1939 Formal Requirements § 260.7a-16 Inclusion of items, differentiation between items and...
Measuring stigma after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Stigma item bank and short form.

Science.gov (United States)

Kisala, Pamela A; Tulsky, David S; Pace, Natalie; Victorson, David; Choi, Seung W; Heinemann, Allen W

2015-05-01

To develop a calibrated item bank and computer adaptive test (CAT) to assess the effects of stigma on health-related quality of life in individuals with spinal cord injury (SCI). Grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, and item response theory (IRT)-based psychometric analyses. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. SCI-QOL Stigma Item Bank A sample of 611 individuals with traumatic SCI completed 30 items assessing SCI-related stigma. After 7 items were iteratively removed, factor analyses confirmed a unidimensional pool of items. Graded Response Model IRT analyses were used to estimate slopes and thresholds for the final 23 items. The SCI-QOL Stigma item bank is unique not only in the assessment of SCI-related stigma but also in the inclusion of individuals with SCI in all phases of its development. Use of confirmatory factor analytic and IRT methods provide flexibility and precision of measurement. The item bank may be administered as a CAT or as a 10-item fixed-length short form and can be used for research and clinical applications.
The PROMIS Physical Function item bank was calibrated to a standardized metric and shown to improve measurement efficiency.

Science.gov (United States)

Rose, Matthias; Bjorner, Jakob B; Gandek, Barbara; Bruce, Bonnie; Fries, James F; Ware, John E

2014-05-01

To document the development and psychometric evaluation of the Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function (PF) item bank and static instruments. The items were evaluated using qualitative and quantitative methods. A total of 16,065 adults answered item subsets (n>2,200/item) on the Internet, with oversampling of the chronically ill. Classical test and item response theory methods were used to evaluate 149 PROMIS PF items plus 10 Short Form-36 and 20 Health Assessment Questionnaire-Disability Index items. A graded response model was used to estimate item parameters, which were normed to a mean of 50 (standard deviation [SD]=10) in a US general population sample. The final bank consists of 124 PROMIS items covering upper, central, and lower extremity functions and instrumental activities of daily living. In simulations, a 10-item computerized adaptive test (CAT) eliminated floor and decreased ceiling effects, achieving higher measurement precision than any comparable length static tool across four SDs of the measurement range. Improved psychometric properties were transferred to the CAT's superior ability to identify differences between age and disease groups. The item bank provides a common metric and can improve the measurement of PF by facilitating the standardization of patient-reported outcome measures and implementation of CATs for more efficient PF assessments over a larger range. Copyright © 2014. Published by Elsevier Inc.
Calibration and Validation of the Dutch-Flemish PROMIS Pain Interference Item Bank in Patients with Chronic Pain.

Science.gov (United States)

Crins, Martine H P; Roorda, Leo D; Smits, Niels; de Vet, Henrica C W; Westhovens, Rene; Cella, David; Cook, Karon F; Revicki, Dennis; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Terwee, Caroline B

2015-01-01

The Dutch-Flemish PROMIS Group translated the adult PROMIS Pain Interference item bank into Dutch-Flemish. The aims of the current study were to calibrate the parameters of these items using an item response theory (IRT) model, to evaluate the cross-cultural validity of the Dutch-Flemish translations compared to the original English items, and to evaluate their reliability and construct validity. The 40 items in the bank were completed by 1085 Dutch chronic pain patients. Before calibrating the items, IRT model assumptions were evaluated using confirmatory factor analysis (CFA). Items were calibrated using the graded response model (GRM), an IRT model appropriate for items with more than two response options. To evaluate cross-cultural validity, differential item functioning (DIF) for language (Dutch vs. English) was examined. Reliability was evaluated based on standard errors and Cronbach's alpha. To evaluate construct validity correlations with scores on legacy instruments (e.g., the Disabilities of the Arm, Shoulder and Hand Questionnaire) were calculated. Unidimensionality of the Dutch-Flemish PROMIS Pain Interference item bank was supported by CFA tests of model fit (CFI = 0.986, TLI = 0.986). Furthermore, the data fit the GRM and showed good coverage across the pain interference continuum (threshold-parameters range: -3.04 to 3.44). The Dutch-Flemish PROMIS Pain Interference item bank has good cross-cultural validity (only two out of 40 items showing DIF), good reliability (Cronbach's alpha = 0.98), and good construct validity (Pearson correlations between 0.62 and 0.75). A computer adaptive test (CAT) and Dutch-Flemish PROMIS short forms of the Dutch-Flemish PROMIS Pain Interference item bank can now be developed.
Calibration and Validation of the Dutch-Flemish PROMIS Pain Interference Item Bank in Patients with Chronic Pain.

Directory of Open Access Journals (Sweden)

Martine H P Crins

Full Text Available The Dutch-Flemish PROMIS Group translated the adult PROMIS Pain Interference item bank into Dutch-Flemish. The aims of the current study were to calibrate the parameters of these items using an item response theory (IRT model, to evaluate the cross-cultural validity of the Dutch-Flemish translations compared to the original English items, and to evaluate their reliability and construct validity. The 40 items in the bank were completed by 1085 Dutch chronic pain patients. Before calibrating the items, IRT model assumptions were evaluated using confirmatory factor analysis (CFA. Items were calibrated using the graded response model (GRM, an IRT model appropriate for items with more than two response options. To evaluate cross-cultural validity, differential item functioning (DIF for language (Dutch vs. English was examined. Reliability was evaluated based on standard errors and Cronbach's alpha. To evaluate construct validity correlations with scores on legacy instruments (e.g., the Disabilities of the Arm, Shoulder and Hand Questionnaire were calculated. Unidimensionality of the Dutch-Flemish PROMIS Pain Interference item bank was supported by CFA tests of model fit (CFI = 0.986, TLI = 0.986. Furthermore, the data fit the GRM and showed good coverage across the pain interference continuum (threshold-parameters range: -3.04 to 3.44. The Dutch-Flemish PROMIS Pain Interference item bank has good cross-cultural validity (only two out of 40 items showing DIF, good reliability (Cronbach's alpha = 0.98, and good construct validity (Pearson correlations between 0.62 and 0.75. A computer adaptive test (CAT and Dutch-Flemish PROMIS short forms of the Dutch-Flemish PROMIS Pain Interference item bank can now be developed.
Dutch translation and cross-cultural adaptation of the PROMIS® physical function item bank and cognitive pre-test in Dutch arthritis patients.

Science.gov (United States)

Oude Voshaar, Martijn Ah; Ten Klooster, Peter M; Taal, Erik; Krishnan, Eswar; van de Laar, Mart Afj

2012-03-05

Patient-reported physical function is an established outcome domain in clinical studies in rheumatology. To overcome the limitations of the current generation of questionnaires, the Patient-Reported Outcomes Measurement Information System (PROMIS®) project in the USA has developed calibrated item banks for measuring several domains of health status in people with a wide range of chronic diseases. The aim of this study was to translate and cross-culturally adapt the PROMIS physical function item bank to the Dutch language and to pretest it in a sample of patients with arthritis. The items of the PROMIS physical function item bank were translated using rigorous forward-backward protocols and the translated version was subsequently cognitively pretested in a sample of Dutch patients with rheumatoid arthritis. Few issues were encountered in the forward-backward translation. Only 5 of the 124 items to be translated had to be rewritten because of culturally inappropriate content. Subsequent pretesting showed that overall, questions of the Dutch version were understood as they were intended, while only one item required rewriting. Results suggest that the translated version of the PROMIS physical function item bank is semantically and conceptually equivalent to the original. Future work will be directed at creating a Dutch-Flemish final version of the item bank to be used in research with Dutch speaking populations.
Statistical power as a function of Cronbach alpha of instrument questionnaire items.

Science.gov (United States)

Heo, Moonseong; Kim, Namhee; Faith, Myles S

2015-10-14

In countless number of clinical trials, measurements of outcomes rely on instrument questionnaire items which however often suffer measurement error problems which in turn affect statistical power of study designs. The Cronbach alpha or coefficient alpha, here denoted by C(α), can be used as a measure of internal consistency of parallel instrument items that are developed to measure a target unidimensional outcome construct. Scale score for the target construct is often represented by the sum of the item scores. However, power functions based on C(α) have been lacking for various study designs. We formulate a statistical model for parallel items to derive power functions as a function of C(α) under several study designs. To this end, we assume fixed true score variance assumption as opposed to usual fixed total variance assumption. That assumption is critical and practically relevant to show that smaller measurement errors are inversely associated with higher inter-item correlations, and thus that greater C(α) is associated with greater statistical power. We compare the derived theoretical statistical power with empirical power obtained through Monte Carlo simulations for the following comparisons: one-sample comparison of pre- and post-treatment mean differences, two-sample comparison of pre-post mean differences between groups, and two-sample comparison of mean differences between groups. It is shown that C(α) is the same as a test-retest correlation of the scale scores of parallel items, which enables testing significance of C(α). Closed-form power functions and samples size determination formulas are derived in terms of C(α), for all of the aforementioned comparisons. Power functions are shown to be an increasing function of C(α), regardless of comparison of interest. The derived power functions are well validated by simulation studies that show that the magnitudes of theoretical power are virtually identical to those of the empirical power. Regardless

Some links on this page may take you to non-federal websites. Their policies may differ from this site.