plato-based test item: Topics by WorldWideScience.org

Sample records for plato-based test item

The PLATO Dome A site-testing observatory: Power generation and control systems

Science.gov (United States)

Lawrence, J. S.; Ashley, M. C. B.; Hengst, S.; Luong-van, D. M.; Storey, J. W. V.; Yang, H.; Zhou, X.; Zhu, Z.

2009-06-01

The atmospheric conditions above Dome A, a currently unmanned location at the highest point on the Antarctic plateau, are uniquely suited to astronomy. For certain types of astronomy Dome A is likely to be the best location on the planet, and this has motivated the development of the Plateau Observatory (PLATO). PLATO was deployed to Dome A in early 2008. It houses a suite of purpose-built site-testing instruments designed to quantify the benefits of Dome A site for astronomy, and science instruments designed to take advantage of the observing conditions. The PLATO power generation and control system is designed to provide continuous power and heat, and a high-reliability command and communications platform for these instruments. PLATO has run and collected data throughout the winter 2008 season completely unattended. Here we present a detailed description of the power generation, power control, thermal management, instrument interface, and communications systems for PLATO, and an overview of the system performance for 2008.
Plato: A localised orbital based density functional theory code

Science.gov (United States)

Kenny, S. D.; Horsfield, A. P.

2009-12-01

The Plato package allows both orthogonal and non-orthogonal tight-binding as well as density functional theory (DFT) calculations to be performed within a single framework. The package also provides extensive tools for analysing the results of simulations as well as a number of tools for creating input files. The code is based upon the ideas first discussed in Sankey and Niklewski (1989) [1] with extensions to allow high-quality DFT calculations to be performed. DFT calculations can utilise either the local density approximation or the generalised gradient approximation. Basis sets from minimal basis through to ones containing multiple radial functions per angular momenta and polarisation functions can be used. Illustrations of how the package has been employed are given along with instructions for its utilisation. Program summaryProgram title: Plato Catalogue identifier: AEFC_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEFC_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 219 974 No. of bytes in distributed program, including test data, etc.: 1 821 493 Distribution format: tar.gz Programming language: C/MPI and PERL Computer: Apple Macintosh, PC, Unix machines Operating system: Unix, Linux and Mac OS X Has the code been vectorised or parallelised?: Yes, up to 256 processors tested RAM: Up to 2 Gbytes per processor Classification: 7.3 External routines: LAPACK, BLAS and optionally ScaLAPACK, BLACS, PBLAS, FFTW Nature of problem: Density functional theory study of electronic structure and total energies of molecules, crystals and surfaces. Solution method: Localised orbital based density functional theory. Restrictions: Tight-binding and density functional theory only, no exact exchange. Unusual features: Both atom centred and uniform meshes available
Finding Ernst Mayr's Plato.

Science.gov (United States)

Powers, Jack

2013-12-01

Many biologists have accepted Ernst Mayr's claim that evolutionary biology undermined an essentialist or typological view of species that had its roots in Platonic philosophy. However, Mayr has been accused of failing to support with textual evidence his attributions to Plato of these sorts of views about biology. Contemporary work in history and philosophy of biology often seems to take onboard Mayr's account of Plato's view of species. This paper seeks to provide a critical account of putative inconsistencies between an evolutionary view of species and Platonic philosophy with renewed attention to the Platonic texts in light of recent Plato scholarship; I argue that claims that Plato held an essentialist view of species inconsistent with evolutionary biology are inadequately supported by textual evidence. If Mayr's essentialist thesis fails, one might think that the intuition that Platonic philosophy is in tension with Darwinian evolution could nonetheless be accounted for by Plato's apparent privileging of a certain sort of teleological explanation, a thesis that Mayr suggests in his 1959 paper on Louis Agassiz. However, this thesis also faces difficulties. Ernst Mayr's Plato is more likely to be found in the writings of anti-evolutionary 19th century biologists like Mayr's frequent target, Agassiz, than in a cautious reading of the Platonic dialogues themselves. Interlocutors in discussions of the history of biological thought and classificatory methods in biology should be cautious in ascribing views about biology to Plato and using terms like "Platonic essentialism." Copyright © 2013 Elsevier Ltd. All rights reserved.
Using automatic item generation to create multiple-choice test items.

Science.gov (United States)

Gierl, Mark J; Lai, Hollis; Turner, Simon R

2012-08-01

Many tests of medical knowledge, from the undergraduate level to the level of certification and licensure, contain multiple-choice items. Although these are efficient in measuring examinees' knowledge and skills across diverse content areas, multiple-choice items are time-consuming and expensive to create. Changes in student assessment brought about by new forms of computer-based testing have created the demand for large numbers of multiple-choice items. Our current approaches to item development cannot meet this demand. We present a methodology for developing multiple-choice items based on automatic item generation (AIG) concepts and procedures. We describe a three-stage approach to AIG and we illustrate this approach by generating multiple-choice items for a medical licensure test in the content area of surgery. To generate multiple-choice items, our method requires a three-stage process. Firstly, a cognitive model is created by content specialists. Secondly, item models are developed using the content from the cognitive model. Thirdly, items are generated from the item models using computer software. Using this methodology, we generated 1248 multiple-choice items from one item model. Automatic item generation is a process that involves using models to generate items using computer technology. With our method, content specialists identify and structure the content for the test items, and computer technology systematically combines the content to generate new test items. By combining these outcomes, items can be generated automatically. © Blackwell Publishing Ltd 2012.
[Plato psychiatrist, Foucault platonic].

Science.gov (United States)

Mathov, Nicolás

2016-05-01

This work explores the links between the concepts of "soul", "law" and "word" in Plato's work, in order to highlight the importance and the centrality of the philosophical-therapeutic dimension in the Greek philosopher's thought. In that way, this work pretends to show that "contemporary" problems usually discussed within "Human Sciences" in general, and Psychiatry in particular, should confront their knowledge with Plato's work, mainly due to the profound influence his ideas have had in our Greco-Christian culture. In that sense, and with that objective, this work also explores Michel Foucault's lucid and controversial interpretation of Plato.
Plato's patricide in the sophist

Directory of Open Access Journals (Sweden)

Deretić Irina J.

2012-01-01

Full Text Available In this paper, the author attempts to elucidate validity of Plato's criticism of Parmenides' simplified monistic ontology, as well as his concept of non-being. In contrast to Parmenides, Plato introduces a more complex ontology of the megista gene and redefines Parmenides' concept of non-being as something absolutely different from being. According to Plato, not all things are in the same sense, i. e. they have the different ontological status. Additionally, he redefines Parmenides' concept of absolute non-being as 'difference' or 'otherness.' .
Plato's Anti-Kohlbergian Program for Moral Education

Science.gov (United States)

Jonas, Mark E.

2016-01-01

Following Lawrence Kohlberg it has been commonplace to regard Plato's moral theory as "intellectualist", where Plato supposedly believes that becoming virtuous requires nothing other than "philosophical knowledge or intuition of the ideal form of the good". This is a radical misunderstanding of Plato's educational programme,…
Relations as Plural-Predications in Plato

OpenAIRE

Scaltsas, Theodore

2013-01-01

Plato was the first philosopher to discover the metaphysical phenomenon of plural-subjects and plural-predication; e.g. you and I are two, but neither you, nor I are two. I argue that Plato devised an ontology for plural-predication through his Theory of Forms, namely, plural-partaking in a Form. Furthermore, I argue that Plato used plural-partaking to offer an ontology of related individuals without reifying relations. My contention is that Plato’s theory of plural-relatives has evaded detec...
Peers on Socrates and Plato

Science.gov (United States)

Mackenzie, Jim

2014-01-01

There is more to be said about two of the topics Chris Peers addresses in his article "Freud, Plato and Irigaray: A morpho-logic of teaching and learning" (2012, Educational Philosophy and Theory, 44, 760-774), namely the Socratic method of teaching and Plato's stance with regard to women and feminism. My purpose in this article is…
Plato, Nightingale, and Nursing: Can You Hear Me Now?

Science.gov (United States)

Arnone, Jacqueline Michele; Fitzsimons, Virginia

2015-10-01

A historical perspective on how the writings of Plato influenced Florence Nightingale in the formation of nursing as a respected profession for women. Comparing Nightingale's life and legacy to Platonic philosophy demonstrates how philosophy continues to speak to the profession of nursing practice as guardians of society in the 21st century. A review of the literature using EBSCO, SAGEpub, MEDLINE, and CINAHL databases and hand searches of literature were initiated for the years 1990-2014 using the terms "Plato," "Nightingale," and "nursing" restricted to English. Florence Nightingale, known as the mother of modern-day nursing, embodied her life and work after the philosophic tenets of Plato. Plato's Allegory of the Cave influenced Nightingale's attitudes with regard to the value of education, knowledge of the good, and the importance of imparting learned knowledge to others. Plato's work spoke of educating both men and women to seek the truth, affording both sexes to become competent as future leaders in the role of guardians to society. Nightingale's emphasis of education for women as a conduit for their usefulness to society mirrored Plato's philosophy. Over 100 years after her death, the impact Florence Nightingale still has on professional nursing practice remains. Scholarship in nursing education today is infused with a liberal arts background in philosophy, ethics, and the sciences. Nightingale's holistic concepts of person, health, and environment in the practice of nursing coalesced with her statistical analyses in validating nursing actions foreshadowed the development of universal nursing knowledge and language base and meta-paradigm concepts in nursing. Further classification and categorization of Nightingale's concepts of assessing, implementing, and evaluating delivery of care became the linguistic precursors for the identification of nursing process, nursing actions, and nursing diagnoses. Plato's and Nightingale's holistic, scientific, and
Reversing Plato’s Anti-Democratism: Castoriadis’ “Quirky” Plato

Directory of Open Access Journals (Sweden)

Hamblet, Wendy C.

2008-12-01

Full Text Available This paper considers the conflicting "loves" of Cornelius Castoriadis--his love for the ancients, and especially Plato, and for the common person of the demos. A detailed study of Castoriadis' analysis of Plato's Statesman exposes that Castoriadis attempts to resolve the paradox by rereading Plato as a radical democrat. I argue that this unorthodox reading is at best "quirky, " (a charge Castoriadis levels at Plato at worst a groundless sophism. However, I conjecture that Castoriadis' reading may not constitute a serious attempt to describe a Platonic politics, so much as a prescriptive reading of what otherwise might have been, given certain strands of political generosity evident elsewhere in Plato's corpus.
Phusis and Nomos in Plato

Directory of Open Access Journals (Sweden)

Zahra Nouri Sanghdehi

2017-07-01

Full Text Available One of the greatest problems in Plato that appears in different forms in his works is the relation of nomos and phusis. This thesis has been in fifth century B.C as the contradiction of phusis and nomos among big thinkers. In this essay, we tried to investigate the relation of phusis and nomos in Plato’s thoughts according to current theories of the contradiction of these in dialogues Gorgias, Republic and Protagoras. Plato tries to minimize consequences of belief to contradiction of phusis and nomos in social and political life by assertion large scale relation between phusis and nomos. Plato depicts the ultimate solution of this problem in Law. There he accounts nomos as raised from phusis that is sub sovereignty of divine. Indeed union of phusis and gods in Plato’s thought is sanction for the identity of phusis and nomos.
The Method of Hypothesis in Plato's Philosophy

Directory of Open Access Journals (Sweden)

Malihe Aboie Mehrizi

2016-09-01

Full Text Available The article deals with the examination of method of hypothesis in Plato's philosophy. This method, respectively, will be examined in three dialogues of Meno, Phaedon and Republic in which it is explicitly indicated. It will be shown the process of change of Plato’s attitude towards the position and usage of the method of hypothesis in his realm of philosophy. In Meno, considering the geometry, Plato attempts to introduce a method that can be used in the realm of philosophy. But, ultimately in Republic, Plato’s special attention to the method and its importance in the philosophical investigations, leads him to revise it. Here, finally Plato introduces the particular method of philosophy, i.e., the dialectic
Evolution of a Test Item

Science.gov (United States)

Spaan, Mary

2007-01-01

This article follows the development of test items (see "Language Assessment Quarterly", Volume 3 Issue 1, pp. 71-79 for the article "Test and Item Specifications Development"), beginning with a review of test and item specifications, then proceeding to writing and editing of items, pretesting and analysis, and finally selection of an item for a…
FORMATION OF ANTIQUE RHETORIC: CHRONOLOGY OF RHETORICAL METHODS AND STYLES (PLATO, ARISTOTLE

Directory of Open Access Journals (Sweden)

Irina A. Pantelyeyeva

2013-09-01

Full Text Available Purpose of the article: to analyze the basic points of philosophical concepts of rhetoric of Plato and Aristotle, to prove that from Plato the rhetoric in the true sense starts being approved, and Aristotle is an ancestor of real theory of speech of the new genre, the new form, the new purposes and tasks of the description of verbal art. Problem statement: development of the ancient principles of rhetorical style’s creating is reached by efforts of outstanding speakers, each of them were differed not only by the ideological sympathies or antipathies, but also by nature of works, the concepts put in their basis. Two Ancient Greek philosophers: Plato and Aristotle are considered as founders of ancient rhetorical science. Methodology. Author has used system method, methods of content and comparative analysis. Scientific novelty is displayed in the received results from the comparative analysis of two concepts of public speech of Plato and Aristotle from a position of philosophical justification of rhetoric’s rules with orientation on ancient "popular" declamation practices. Practical value of article consists in development of insufficiently studied object "Antique declamation discourse" where Plato and Aristotle's two central rhetorical concepts appear as the intermediate stage in development of a declamation discourse of Ancient Greece and, subsequently, and Ancient Rome. Conclusions. The conclusions can be given by the following facts: from Plato the rhetoric in the true sense is approved: true rhetorical art isn’t based only on argument technique, the true rhethor appears as the philosopher. Plato raises the problem of an ambiguity of two opposite rhetorics presented in "Gorgias" and "Phaedrus ". Rhetoric as scientific discipline, as the present theory of speech is first considered by Aristotle. The rhetoric is presented as the science "about speech and about thoughts", about the relation of thinking to the word.
PLATO Esperanto Materials.

Science.gov (United States)

Sherwood, Judith

1981-01-01

A summary is presented of types of Esperanto materials available on PLATO--a general overview section, a picture introduction, lessons that accompany a textbook, vocabulary drills, crossword puzzles, dictation drills, reading practice, and a concentration game. The general overview lesson gives a comprehensive summary of the history and…
Selecting Items for Criterion-Referenced Tests.

Science.gov (United States)

Mellenbergh, Gideon J.; van der Linden, Wim J.

1982-01-01

Three item selection methods for criterion-referenced tests are examined: the classical theory of item difficulty and item-test correlation; the latent trait theory of item characteristic curves; and a decision-theoretic approach for optimal item selection. Item contribution to the standardized expected utility of mastery testing is discussed. (CM)
The Prediction of Item Parameters Based on Classical Test Theory and Latent Trait Theory

Science.gov (United States)

Anil, Duygu

2008-01-01

In this study, the prediction power of the item characteristics based on the experts' predictions on conditions try-out practices cannot be applied was examined for item characteristics computed depending on classical test theory and two-parameters logistic model of latent trait theory. The study was carried out on 9914 randomly selected students…
PLATO[R] Achieve Now. What Works Clearinghouse Intervention Report

Science.gov (United States)

What Works Clearinghouse, 2010

2010-01-01

"PLATO[R] Achieve Now" is a software-based curriculum for the elementary and middle school grades. Instructional content is delivered via the PlayStation Portable (PSP[R]) system, allowing students to access learning materials in various settings. Software-based assessments are used to customize individual instruction, allowing students…
Anamnesis and the Silent Narrator in Plato and John

Directory of Open Access Journals (Sweden)

George L. Parsenios

2017-03-01

Full Text Available The Gospel of John is often compared to the dialogues of Plato by those who connect Johannine theology and Platonic philosophy. The comparison operates on the level of ideas. The present paper does not ignore issues of theology and philosophy but grounds a comparison of John and Plato first and foremost on the literary level. In several key places in John 1, 3, and 14, the Johannine narrator recedes from view and is unexpectedly silent where one would expect a narrator’s comment to organize the conversations and interactions between characters in John. Plato also renders the voice of the narrator silent in a dialogue like the Theaetetus. This paper argues that John and Plato both suppress the narrator’s voice in order to further their anamnetic efforts and to make later generations not only readers but participants in their original conversations.

Algorithms for computerized test construction using classical item parameters

NARCIS (Netherlands)

Adema, Jos J.; van der Linden, Willem J.

1989-01-01

Recently, linear programming models for test construction were developed. These models were based on the information function from item response theory. In this paper another approach is followed. Two 0-1 linear programming models for the construction of tests using classical item and test
Test Score Equating Using Discrete Anchor Items versus Passage-Based Anchor Items: A Case Study Using "SAT"® Data. Research Report. ETS RR-14-14

Science.gov (United States)

Liu, Jinghua; Zu, Jiyun; Curley, Edward; Carey, Jill

2014-01-01

The purpose of this study is to investigate the impact of discrete anchor items versus passage-based anchor items on observed score equating using empirical data.This study compares an "SAT"® critical reading anchor that contains more discrete items proportionally, compared to the total tests to be equated, to another anchor that…
Computerized adaptive testing item selection in computerized adaptive learning systems

NARCIS (Netherlands)

Eggen, Theodorus Johannes Hendrikus Maria; Eggen, T.J.H.M.; Veldkamp, B.P.

2012-01-01

Item selection methods traditionally developed for computerized adaptive testing (CAT) are explored for their usefulness in item-based computerized adaptive learning (CAL) systems. While in CAT Fisher information-based selection is optimal, for recovering learning populations in CAL systems item
PLATO IV Accountancy Index.

Science.gov (United States)

Pondy, Dorothy, Comp.

The catalog was compiled to assist instructors in planning community college and university curricula using the 48 computer-assisted accountancy lessons available on PLATO IV (Programmed Logic for Automatic Teaching Operation) for first semester accounting courses. It contains information on lesson access, lists of acceptable abbreviations for…
Plato and the teaching of entrepreneurship studies as general ...

African Journals Online (AJOL)

Secondly to use Plato's model of education to stress the importance of the practical aspect of entrepreneurial studies so as to avoid the old syndrome of breeding certificate Laden, theory filled entrepreneurial studies. For Plato, education should be tailored to suit the learner specialization; that is a carpenter should be taught ...
Slovenian test case Vrbanski Plato aquifer in the EU HORIZON 2020 FREEWAT project

Directory of Open Access Journals (Sweden)

Irena Kopač

2017-09-01

Full Text Available The Slovenian case study in the EU HORIZON 2020 FREEWAT project was Vrbanski Plato aquifer. Slovenia is divided into two river basin districts: the Danube and the North Adriatic. The Vrbanski Plato aquifer, which he presents both natural and artificial bank filtration from the river Drava, is a part of the Danube river basin district and is the most important water source for 14 municipalities in the northeastern part of Slovenia. We investigated the groundwatersurface water interaction between river Drava and the porous aquifer in the geological old riverbed and possible reduction of city impact. This site is the oldest managed artificial groundwater recharge with riverbank filtration and has more than thirty years of successful operation. It is something special, very abundant in a small space, independent of drought and climate changes, but vulnerable due to the impact of the city. Under the city there is watershed dividing, which is shifting with different water management condition and we would like to have the least possible impact of the city. For optimal water management we decided to use FREEWAT plug-in within QGIS platform. With new developed FREEWAT plug-in in project FREEWAT, we made steady-state and transient groundwater model for presenting this shift of the watershed dividing under the city and optimal water management for this area. The model was designed in a way that it identifies and describes all major aspects of the physical hydrogeological system and water management. During the running of a project, there was an accident with heating oil spillage in city area, right on the watershed dividing. So we oriented with the transient groundwater model as well on heating oil spillage and pumping with additional wells at the place of the accident to present successful rehabilitation and the importance of the managed groundwater recharge. Our experience with FREEWAT platform during the Vrbanski Plato aquifer case study was very
Future development of the PLATO Observatory for Antarctic science

Science.gov (United States)

Ashley, Michael C. B.; Bonner, Colin S.; Everett, Jon R.; Lawrence, Jon S.; Luong-Van, Daniel; McDaid, Scott; McLaren, Campbell; Storey, John W. V.

2010-07-01

PLATO is a self-contained robotic observatory built into two 10-foot shipping containers. It has been successfully deployed at Dome A on the Antarctic plateau since January 2008, and has accumulated over 730 days of uptime at the time of writing. PLATO provides 0.5{1kW of continuous electrical power for a year from diesel engines running on Jet-A1, supplemented during the summertime with solar panels. One of the 10-foot shipping containers houses the power system and fuel, the other provides a warm environment for instruments. Two Iridium satellite modems allow 45 MB/day of data to be transferred across the internet. Future enhancements to PLATO, currently in development, include a more modular design, using lithium iron-phosphate batteries, higher power output, and a light-weight low-power version for eld deployment from a Twin Otter aircraft. Technologies used in PLATO include a CAN (Controller Area Network) bus, high-reliability PC/104 com- puters, ultracapacitors for starting the engines, and fault-tolerant redundant design.
Item Modeling Concept Based on Multimedia Authoring

Directory of Open Access Journals (Sweden)

Janez Stergar

2008-09-01

Full Text Available In this paper a modern item design framework for computer based assessment based on Flash authoring environment will be introduced. Question design will be discussed as well as the multimedia authoring environment used for item modeling emphasized. Item type templates are a structured means of collecting and storing item information that can be used to improve the efficiency and security of the innovative item design process. Templates can modernize the item design, enhance and speed up the development process. Along with content creation, multimedia has vast potential for use in innovative testing. The introduced item design template is based on taxonomy of innovative items which have great potential for expanding the content areas and construct coverage of an assessment. The presented item design approach is based on GUI's – one for question design based on implemented item design templates and one for user interaction tracking/retrieval. The concept of user interfaces based on Flash technology will be discussed as well as implementation of the innovative approach of the item design forms with multimedia authoring. Also an innovative method for user interaction storage/retrieval based on PHP extending Flash capabilities in the proposed framework will be introduced.
Plato's Theories of Knowledge and Education: an Examination of the ...

African Journals Online (AJOL)

Plato's Theories of Knowledge and Education: an Examination of the Interpretations of Cloete and Agyemang. ... UJAH: Unizik Journal of Arts and Humanities ... views, this article reveals some serious logical and factual errors in Cloete's interpretations, and thereby clarifies Plato's epistemology and theories of education.
A person fit test for IRT models for polytomous items

NARCIS (Netherlands)

Glas, Cornelis A.W.; Dagohoy, A.V.

2007-01-01

A person fit test based on the Lagrange multiplier test is presented for three item response theory models for polytomous items: the generalized partial credit model, the sequential model, and the graded response model. The test can also be used in the framework of multidimensional ability
MTA Computer Based Evaluation System.

Science.gov (United States)

Brenner, Lisa P.; And Others

The MTA PLATO-based evaluation system, which has been implemented by a consortium of schools of medical technology, is designed to be general-purpose, modular, data-driven, and interactive, and to accommodate other national and local item banks. The system provides a comprehensive interactive item-banking system in conjunction with online student…
Plato the Pederast: Rhetoric and Cultural Procreation in the Dialogues.

Science.gov (United States)

Ervin, Elizabeth

1993-01-01

Examines Plato's Dialogues by reading them through two cultural lenses: the role of eros in classical Greece and its analogous relationship to language and rhetoric; and the educational function of eros within the ancient institution of pederasty. Shows how the cultural values of ancient Greece manifested themselves in Plato's erotic educational…
Rationality and Motivation: Moral Psychology in Plato's Socratic Dialogues

OpenAIRE

Neiders, Ivars

2011-01-01

"Rationality and Motivation: Moral Psychology in Plato's Socratic Dialogues" Annotation The dissertation "Rationality and Motivation: Moral Psychology in Plato's Socratic Dialogues" is a philosophical study of Socratic views in moral psychology. Particular attention is paid to what the author calls (1) Doxastic competence and (2) Orectic competence. It is argued that according to Socrates these two different epistemic relations are important aspects of our self-understanding. The doxast...
Four Educators in Plato's "Theaetetus"

Science.gov (United States)

Mintz, Avi I.

2011-01-01

Scholars who have taken interest in "Theaetetus'" educational theme argue that Plato contrasts an inferior, even dangerous, sophistic education to a superior, philosophical, Socratic education. I explore the contrasting exhortations, methods, ideals and epistemological foundations of Socratic and Protagorean education and suggest that Socrates'…
Item Analysis in Introductory Economics Testing.

Science.gov (United States)

Tinari, Frank D.

1979-01-01

Computerized analysis of multiple choice test items is explained. Examples of item analysis applications in the introductory economics course are discussed with respect to three objectives: to evaluate learning; to improve test items; and to help improve classroom instruction. Problems, costs and benefits of the procedures are identified. (JMD)
An Explanatory Item Response Theory Approach for a Computer-Based Case Simulation Test

Science.gov (United States)

Kahraman, Nilüfer

2014-01-01

Problem: Practitioners working with multiple-choice tests have long utilized Item Response Theory (IRT) models to evaluate the performance of test items for quality assurance. The use of similar applications for performance tests, however, is often encumbered due to the challenges encountered in working with complicated data sets in which local…
Discourse, Dialectic and Intrapersonal Rhetoric: A Reinterpretation of Plato's Rhetorical Theory.

Science.gov (United States)

Hikins, James W.

The idea that rhetoric might operate in epistemologically significant ways was first presented by Plato. This paper argues that the heart of Plato's conception of epistemic discourse is a recognition of the centrality of intrapersonal rhetoric. Through a careful study of Platonic writing, particularly the "Phaedrus," three principal…
The PLATO 2.0 mission

NARCIS (Netherlands)

Rauer, H.; et al., [Unknown; Hekker, S.

2014-01-01

PLATO 2.0 has recently been selected for ESA’s M3 launch opportunity (2022/24). Providing accurate key planet parameters (radius, mass, density and age) in statistical numbers, it addresses fundamental questions such as: How do planetary systems form and evolve? Are there other systems with planets
Evaluating PLATO: postgraduate teaching and learning online.

Science.gov (United States)

Brown, Menna; Bullock, Alison

2014-02-01

The use of the Internet as a teaching medium has increased rapidly over the last decade. PLATO (postgraduate learning and teaching online) was launched in 2008 by the e-learning unit (ELU) of Wales Deanery. Located within Learning@NHSWales, a Moodle virtual learning environment (VLE), it hosts a wide range of freely available courses and resources tailored to support the education, training and continuing professional development (CPD) needs of health care professionals working across the National Health Service (NHS) Wales. The evaluation aimed to identify the costs and benefits of PLATO, report its value as attributed by users, identify potential cost savings and make recommendations. Five courses (case studies) were selected, representing the range of available e-learning resources: e-induction; fetal heart monitoring; cervical screening; GP prospective trainers; and tools for trainers. Mixed methods were used: one-to-one qualitative interviews, focus group discussions and surveys explored user views, and identified individual and organisational value. Qualitative findings identified six key areas of value for users: ELU support and guidance; avoidance of duplication and standardisation; central reference; local control; flexibility for learners; and specific features. Survey results (n=72) indicated 72 per cent of consultants reported that PLATO was easy to access and user friendly. E-learning was rated as 'very/important' for CPD by 79 per cent of respondents. Key challenges were: access, navigation, user concerns, awareness and support. PLATO supports education and helps deliver UK General Medical Council standards. Future plans should address the suggested recommendations to realise cost savings for NHS Wales and the Wales Deanery. The findings have wider applicability to others developing or using VLEs. © 2014 John Wiley & Sons Ltd.
Science Library of Test Items. Volume Eighteen. A Collection of Multiple Choice Test Items Relating Mainly to Chemistry.

Science.gov (United States)

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

Science Library of Test Items. Volume Seventeen. A Collection of Multiple Choice Test Items Relating Mainly to Biology.

Science.gov (United States)

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Nineteen. A Collection of Multiple Choice Test Items Relating Mainly to Geology.

Science.gov (United States)

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Evaluation of Northwest University, Kano Post-UTME Test Items Using Item Response Theory

Science.gov (United States)

Bichi, Ado Abdu; Hafiz, Hadiza; Bello, Samira Abdullahi

2016-01-01

High-stakes testing is used for the purposes of providing results that have important consequences. Validity is the cornerstone upon which all measurement systems are built. This study applied the Item Response Theory principles to analyse Northwest University Kano Post-UTME Economics test items. The developed fifty (50) economics test items was…
Expected asteroseismic performances with the space project PLATO

Directory of Open Access Journals (Sweden)

Goupil Mariejo

2017-01-01

Full Text Available The PLATO (PLAnetary Transits and Oscillations of star space project will observe about fifty percents of the sky with the main purpose of detecting, confirming and characterizing transiting exoplanets of (superEarth sizes in the habitable zone of solar-like stars. Determining masses, radii and ages of exoplanets require the knowledge the masses, radii and ages of the host stars. We give a brief presentation of the main features of the mission. We then discuss some expected seismic performances of PLATO for characterizing bright solar-like stars, focusing on the challenging determination of accurate/precise stellar ages.
Differential Weighting of Items to Improve University Admission Test Validity

Directory of Open Access Journals (Sweden)

Eduardo Backhoff Escudero

2001-05-01

Full Text Available This paper gives an evaluation of different ways to increase university admission test criterion-related validity, by differentially weighting test items. We compared four methods of weighting multiple-choice items of the Basic Skills and Knowledge Examination (EXHCOBA: (1 punishing incorrect responses by a constant factor, (2 weighting incorrect responses, considering the levels of error, (3 weighting correct responses, considering the item’s difficulty, based on the Classic Measurement Theory, and (4 weighting correct responses, considering the item’s difficulty, based on the Item Response Theory. Results show that none of these methods increased the instrument’s predictive validity, although they did improve its concurrent validity. It was concluded that it is appropriate to score the test by simply adding up correct responses.
Quoting Plato in Porphyrius' Cuestiones homericas

Directory of Open Access Journals (Sweden)

Lucía Rodríguez‑Noriega Guillén

2016-08-01

Full Text Available This paper studies the quotations of Plato in Porphyry’s Homeric Questions,including their typology (literal quotation, allusion, paraphrase, etc., their beingor not direct citations, their function in the work, and their possible parallels inother authors.
Prospects for detecting decreasing exoplanet frequency with main-sequence age using PLATO

Science.gov (United States)

Veras, D.; Brown, D. J. A.; Mustill, A. J.; Pollacco, D.

2017-09-01

The space mission PLATO will usher in a new era of exoplanetary science by expanding our current inventory of transiting systems and constraining host star ages, which are currently highly uncertain. This capability might allow PLATO to detect changes in planetary system architecture with time, particularly because planetary scattering due to Lagrange instability may be triggered long after the system was formed. Here, we utilize previously published instability time-scale prescriptions to determine PLATO's capability to detect a trend of decreasing planet frequency with age for systems with equal- mass planets. For two-planet systems, our results demonstrate that PLATO may detect a trend for planet masses which are at least as massive as super-Earths. For systems with three or more planets, we link their initial compactness to potentially detectable frequency trends in order to aid future investigations when these populations will be better characterized.
Criteria for eliminating items of a Test of Figural Analogies

Directory of Open Access Journals (Sweden)

Diego Blum

2013-12-01

Full Text Available This paper describes the steps taken to eliminate two of the items in a Test of Figural Analogies (TFA. The main guidelines of psychometric analysis concerning Classical Test Theory (CTT and Item Response Theory (IRT are explained. The item elimination process was based on both the study of the CTT difficulty and discrimination index, and the unidimensionality analysis. The a, b, and c parameters of the Three Parameter Logistic Model of IRT were also considered for this purpose, as well as the assessment of each item fitting this model. The unfavourable characteristics of a group of TFA items are detailed, and decisions leading to their possible elimination are discussed.
Bayes Factor Covariance Testing in Item Response Models.

Science.gov (United States)

Fox, Jean-Paul; Mulder, Joris; Sinharay, Sandip

2017-12-01

Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning the underlying covariance structure are evaluated using (fractional) Bayes factor tests. The support for a unidimensional factor (i.e., assumption of local independence) and differential item functioning are evaluated by testing the covariance components. The posterior distribution of common covariance components is obtained in closed form by transforming latent responses with an orthogonal (Helmert) matrix. This posterior distribution is defined as a shifted-inverse-gamma, thereby introducing a default prior and a balanced prior distribution. Based on that, an MCMC algorithm is described to estimate all model parameters and to compute (fractional) Bayes factor tests. Simulation studies are used to show that the (fractional) Bayes factor tests have good properties for testing the underlying covariance structure of binary response data. The method is illustrated with two real data studies.
Computerized Adaptive Test (CAT) Applications and Item Response Theory Models for Polytomous Items

Science.gov (United States)

Aybek, Eren Can; Demirtasli, R. Nukhet

2017-01-01

This article aims to provide a theoretical framework for computerized adaptive tests (CAT) and item response theory models for polytomous items. Besides that, it aims to introduce the simulation and live CAT software to the related researchers. Computerized adaptive test algorithm, assumptions of item response theory models, nominal response…
Science Library of Test Items. Volume Twenty-Two. A Collection of Multiple Choice Test Items Relating Mainly to Skills.

Science.gov (United States)

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Twenty. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 1.

Science.gov (United States)

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Plato's Cosmic Theology: A Rationale for a Polytheistic Astrology?

Science.gov (United States)

Henriques, André

2015-05-01

Plato's cosmology influenced classical astronomy and religion, but was in turn influenced by the polytheistic context of its time. Throughout his texts, including the cosmological treatise Timaeus, and the discussions on the soul in the Phaedrus, Plato (c.428-c.348 BC) established what can be generalised as Platonic cosmological thought. An understanding of the philosophical and mythical levels of Platonic thought can provide a rationale for polytheistic and astrological worldviews, pointing to some cosmological continuity, alongside major shifts, from ancient Greek religion to the astrological thought of ancient astronomers such as Claudius Ptolemy.
Differential item functioning analysis of the Vanderbilt Expertise Test for cars.

Science.gov (United States)

Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W; Van Gulick, Ana Beth; Gauthier, Isabel

2015-01-01

The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge.
An Investigation of Item Type in a Standards-Based Assessment.

Directory of Open Access Journals (Sweden)

Liz Hollingworth

2007-12-01

Full Text Available Large-scale state assessment programs use both multiple-choice and open-ended items on tests for accountability purposes. Certainly, there is an intuitive belief among some educators and policy makers that open-ended items measure something different than multiple-choice items. This study examined two item formats in custom-built, standards-based tests of achievement in Reading and Mathematics at grades 3-8. In this paper, we raise questions about the value of including open-ended items, given scoring costs, time constraints, and the higher probability of missing data from test-takers.
Evaluating an Automated Number Series Item Generator Using Linear Logistic Test Models

Directory of Open Access Journals (Sweden)

Bao Sheng Loe

2018-04-01

Full Text Available This study investigates the item properties of a newly developed Automatic Number Series Item Generator (ANSIG. The foundation of the ANSIG is based on five hypothesised cognitive operators. Thirteen item models were developed using the numGen R package and eleven were evaluated in this study. The 16-item ICAR (International Cognitive Ability Resource1 short form ability test was used to evaluate construct validity. The Rasch Model and two Linear Logistic Test Model(s (LLTM were employed to estimate and predict the item parameters. Results indicate that a single factor determines the performance on tests composed of items generated by the ANSIG. Under the LLTM approach, all the cognitive operators were significant predictors of item difficulty. Moderate to high correlations were evident between the number series items and the ICAR test scores, with high correlation found for the ICAR Letter-Numeric-Series type items, suggesting adequate nomothetic span. Extended cognitive research is, nevertheless, essential for the automatic generation of an item pool with predictable psychometric properties.
Instructional Topics in Educational Measurement (ITEMS) Module: Using Automated Processes to Generate Test Items

Science.gov (United States)

Gierl, Mark J.; Lai, Hollis

2013-01-01

Changes to the design and development of our educational assessments are resulting in the unprecedented demand for a large and continuous supply of content-specific test items. One way to address this growing demand is with automatic item generation (AIG). AIG is the process of using item models to generate test items with the aid of computer…
From Pericles to Plato

DEFF Research Database (Denmark)

Larsen, Øjvind

2012-01-01

Plato is normally taken as one of the founders of Western political philosophy, not at least with his Republic. Here, he constructs a hierarchy of forms of governments, beginning with aristocracy at the top as a critical standard for the other forms of governments, and proceeding through timocracy......’ funeral oration is used to show that Pericles presented a democratic political philosophy that can serve as a counterpoint to Plato’s political philosophy in the Republic....
Electronics. Criterion-Referenced Test (CRT) Item Bank.

Science.gov (United States)

Davis, Diane, Ed.

This document contains 519 criterion-referenced multiple choice and true or false test items for a course in electronics. The test item bank is designed to work with both the Vocational Instructional Management System (VIMS) and the Vocational Administrative Management System (VAMS) in Missouri. The items are grouped into 15 units covering the…
Science Library of Test Items. Volume Twenty-One. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 2.

Science.gov (United States)

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

Guide to good practices for the development of test items

Energy Technology Data Exchange (ETDEWEB)

NONE

1997-01-01

While the methodology used in developing test items can vary significantly, to ensure quality examinations, test items should be developed systematically. Test design and development is discussed in the DOE Guide to Good Practices for Design, Development, and Implementation of Examinations. This guide is intended to be a supplement by providing more detailed guidance on the development of specific test items. This guide addresses the development of written examination test items primarily. However, many of the concepts also apply to oral examinations, both in the classroom and on the job. This guide is intended to be used as guidance for the classroom and laboratory instructor or curriculum developer responsible for the construction of individual test items. This document focuses on written test items, but includes information relative to open-reference (open book) examination test items, as well. These test items have been categorized as short-answer, multiple-choice, or essay. Each test item format is described, examples are provided, and a procedure for development is included. The appendices provide examples for writing test items, a test item development form, and examples of various test item formats.
Gender-Based Differential Item Performance in Mathematics Achievement Items.

Science.gov (United States)

Doolittle, Allen E.; Cleary, T. Anne

1987-01-01

Eight randomly equivalent samples of high school seniors were each given a unique form of the ACT Assessment Mathematics Usage Test (ACTM). Signed measures of differential item performance (DIP) were obtained for each item in the eight ACTM forms. DIP estimates were analyzed and a significant item category effect was found. (Author/LMO)
The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory

Science.gov (United States)

Sahin, Alper; Anil, Duygu

2017-01-01

This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…
Group differences in the heritability of items and test scores

NARCIS (Netherlands)

Wicherts, J.M.; Johnson, W.

2009-01-01

It is important to understand potential sources of group differences in the heritability of intelligence test scores. On the basis of a basic item response model we argue that heritabilities which are based on dichotomous item scores normally do not generalize from one sample to the next. If groups
Development of a lack of appetite item bank for computer-adaptive testing (CAT)

DEFF Research Database (Denmark)

Thamsborg, Lise Laurberg Holst; Petersen, Morten Aa; Aaronson, Neil K

2015-01-01

to 12 lack of appetite items. CONCLUSIONS: Phases 1-3 resulted in 12 lack of appetite candidate items. Based on a field testing (phase 4), the psychometric characteristics of the items will be assessed and the final item bank will be generated. This CAT item bank is expected to provide precise...
An empirical comparison of Item Response Theory and Classical Test Theory

Directory of Open Access Journals (Sweden)

Špela Progar

2008-11-01

Full Text Available Based on nonlinear models between the measured latent variable and the item response, item response theory (IRT enables independent estimation of item and person parameters and local estimation of measurement error. These properties of IRT are also the main theoretical advantages of IRT over classical test theory (CTT. Empirical evidence, however, often failed to discover consistent differences between IRT and CTT parameters and between invariance measures of CTT and IRT parameter estimates. In this empirical study a real data set from the Third International Mathematics and Science Study (TIMSS 1995 was used to address the following questions: (1 How comparable are CTT and IRT based item and person parameters? (2 How invariant are CTT and IRT based item parameters across different participant groups? (3 How invariant are CTT and IRT based item and person parameters across different item sets? The findings indicate that the CTT and the IRT item/person parameters are very comparable, that the CTT and the IRT item parameters show similar invariance property when estimated across different groups of participants, that the IRT person parameters are more invariant across different item sets, and that the CTT item parameters are at least as much invariant in different item sets as the IRT item parameters. The results furthermore demonstrate that, with regards to the invariance property, IRT item/person parameters are in general empirically superior to CTT parameters, but only if the appropriate IRT model is used for modelling the data.
Plato and the Modern American "Right": Agendas, Assumptions, and the Culture of Fear

Science.gov (United States)

Ramsey, Paul

2009-01-01

This article presents an interpretation of Plato's "Republic" that has many striking similarities to the social agenda of modern educational conservatives in the United States, which is particularly timely because George W. Bush's administration is, at this writing, coming to an end. Plato's ideal city is best seen as one that promoted an…
Effects of Using Modified Items to Test Students with Persistent Academic Difficulties

Science.gov (United States)

Elliott, Stephen N.; Kettler, Ryan J.; Beddow, Peter A.; Kurz, Alexander; Compton, Elizabeth; McGrath, Dawn; Bruen, Charles; Hinton, Kent; Palmer, Porter; Rodriguez, Michael C.; Bolt, Daniel; Roach, Andrew T.

2010-01-01

This study investigated the effects of using modified items in achievement tests to enhance accessibility. An experiment determined whether tests composed of modified items would reduce the performance gap between students eligible for an alternate assessment based on modified achievement standards (AA-MAS) and students not eligible, and the…
ACER Chemistry Test Item Collection. ACER Chemtic Year 12.

Science.gov (United States)

Australian Council for Educational Research, Hawthorn.

The chemistry test item banks contains 225 multiple-choice questions suitable for diagnostic and achievement testing; a three-page teacher's guide; answer key with item facilities; an answer sheet; and a 45-item sample achievement test. Although written for the new grade 12 chemistry course in Victoria, Australia, the items are widely applicable.…
Assessing difference between classical test theory and item ...

African Journals Online (AJOL)

Assessing difference between classical test theory and item response theory methods in scoring primary four multiple choice objective test items. ... All research participants were ranked on the CTT number correct scores and the corresponding IRT item pattern scores from their performance on the PRISMADAT. Wilcoxon ...
Development of a psychological test to measure ability-based emotional intelligence in the Indonesian workplace using an item response theory.

Science.gov (United States)

Fajrianthi; Zein, Rizqy Amelia

2017-01-01

This study aimed to develop an emotional intelligence (EI) test that is suitable to the Indonesian workplace context. Airlangga Emotional Intelligence Test (Tes Kecerdasan Emosi Airlangga [TKEA]) was designed to measure three EI domains: 1) emotional appraisal, 2) emotional recognition, and 3) emotional regulation. TKEA consisted of 120 items with 40 items for each subset. TKEA was developed based on the Situational Judgment Test (SJT) approach. To ensure its psychometric qualities, categorical confirmatory factor analysis (CCFA) and item response theory (IRT) were applied to test its validity and reliability. The study was conducted on 752 participants, and the results showed that test information function (TIF) was 3.414 (ability level = 0) for subset 1, 12.183 for subset 2 (ability level = -2), and 2.398 for subset 3 (level of ability = -2). It is concluded that TKEA performs very well to measure individuals with a low level of EI ability. It is worth to note that TKEA is currently at the development stage; therefore, in this study, we investigated TKEA's item analysis and dimensionality test of each TKEA subset.
Isocrates and Plato on Rhetoric and Rhetorical Education.

Science.gov (United States)

Benoit, William L.

1991-01-01

Compares the views of Isocrates and Plato on rhetoric and rhetorical education. Elucidates their criticisms of the sophists, their general assumptions about the nature and function of rhetoric, and their views on rhetorical education. (PRA)
Applying modern psychometric techniques to melodic discrimination testing: Item response theory, computerised adaptive testing, and automatic item generation.

Science.gov (United States)

Harrison, Peter M C; Collins, Tom; Müllensiefen, Daniel

2017-06-15

Modern psychometric theory provides many useful tools for ability testing, such as item response theory, computerised adaptive testing, and automatic item generation. However, these techniques have yet to be integrated into mainstream psychological practice. This is unfortunate, because modern psychometric techniques can bring many benefits, including sophisticated reliability measures, improved construct validity, avoidance of exposure effects, and improved efficiency. In the present research we therefore use these techniques to develop a new test of a well-studied psychological capacity: melodic discrimination, the ability to detect differences between melodies. We calibrate and validate this test in a series of studies. Studies 1 and 2 respectively calibrate and validate an initial test version, while Studies 3 and 4 calibrate and validate an updated test version incorporating additional easy items. The results support the new test's viability, with evidence for strong reliability and construct validity. We discuss how these modern psychometric techniques may also be profitably applied to other areas of music psychology and psychological science in general.
Plato, Aristotle and the phytagorean influence on Plutarch's De Musica

Directory of Open Access Journals (Sweden)

Roosevelt Rocha

2012-07-01

Full Text Available In his treatise On Music, Plutarch cites and discusses excerpts from works of Plato and Aristotle in which these authors deal with issues relevant to the harmonic theory. In these passages, we see that the sources used by Plutarch have a strong influence of the Pythagorean school, under which the study of musical scales was developed focusing on the mathematical relationships that exist between the notes and intervals. This indicates that Plutarch or not directly read the texts of Plato and Aristotle, or read, but using some other source, a commentator of Pythagorean extraction, who we can not identify.
Binomial test models and item difficulty

NARCIS (Netherlands)

van der Linden, Willem J.

1979-01-01

In choosing a binomial test model, it is important to know exactly what conditions are imposed on item difficulty. In this paper these conditions are examined for both a deterministic and a stochastic conception of item responses. It appears that they are more restrictive than is generally
THE JOURNEY OF TRUTH: FROM PLATO TO ZOLA

Directory of Open Access Journals (Sweden)

Ribut Basuki

1999-01-01

Full Text Available Western theater theory and criticism is generally considered to be set forth by the Greeks. Plato was "the first theater critic" with his negative comments about theater owing to his idealistic views about "the truth." Then came Aristotle who used a different viewpoint from that of Plato, saying that there is "truth" in theater. However, hostile criticism on theater came back in the Middle Ages, championed by Tertulian before Aristotelian theory was revived by the neo-classicists such as Scaliger and Castelvetro. Theater theory and criticism discourse was then made more alive by the romanticists who disagreed with the neo-classicists' rigid rules on theater. As the influence of science became dominant in the theater world, naturalism and realism emerged and became the mainstream of theater theory and criticism until well into the twentieth century.
Science Literacy: How do High School Students Solve PISA Test Items?

Science.gov (United States)

Wati, F.; Sinaga, P.; Priyandoko, D.

2017-09-01

The Programme for International Students Assessment (PISA) does assess students’ science literacy in a real-life contexts and wide variety of situation. Therefore, the results do not provide adequate information for the teacher to excavate students’ science literacy because the range of materials taught at schools depends on the curriculum used. This study aims to investigate the way how junior high school students in Indonesia solve PISA test items. Data was collected by using PISA test items in greenhouse unit employed to 36 students of 9th grade. Students’ answer was analyzed qualitatively for each item based on competence tested in the problem. The way how students answer the problem exhibits their ability in particular competence which is influenced by a number of factors. Those are students’ unfamiliarity with test construction, low performance on reading, low in connecting available information and question, and limitation on expressing their ideas effectively and easy-read. As the effort, selected PISA test items can be used in accordance teaching topic taught to familiarize students with science literacy.
Plato's problem an introduction to mathematical platonism

CERN Document Server

Panza, M

2013-01-01

What is mathematics about? And how can we have access to the reality it is supposed to describe? The book tells the story of this problem, first raised by Plato, through the views of Aristotle, Proclus, Kant, Frege, Gödel, Benacerraf, up to the most recent debate on mathematical platonism.
Development of a psychological test to measure ability-based emotional intelligence in the Indonesian workplace using an item response theory

Directory of Open Access Journals (Sweden)

Fajrianthi

2017-11-01

Full Text Available Fajrianthi,1 Rizqy Amelia Zein2 1Department of Industrial and Organizational Psychology, 2Department of Personality and Social Psychology, Faculty of Psychology, Universitas Airlangga, Surabaya, East Java, Indonesia Abstract: This study aimed to develop an emotional intelligence (EI test that is suitable to the Indonesian workplace context. Airlangga Emotional Intelligence Test (Tes Kecerdasan Emosi Airlangga [TKEA] was designed to measure three EI domains: 1 emotional appraisal, 2 emotional recognition, and 3 emotional regulation. TKEA consisted of 120 items with 40 items for each subset. TKEA was developed based on the Situational Judgment Test (SJT approach. To ensure its psychometric qualities, categorical confirmatory factor analysis (CCFA and item response theory (IRT were applied to test its validity and reliability. The study was conducted on 752 participants, and the results showed that test information function (TIF was 3.414 (ability level = 0 for subset 1, 12.183 for subset 2 (ability level = -2, and 2.398 for subset 3 (level of ability = -2. It is concluded that TKEA performs very well to measure individuals with a low level of EI ability. It is worth to note that TKEA is currently at the development stage; therefore, in this study, we investigated TKEA’s item analysis and dimensionality test of each TKEA subset. Keywords: categorical confirmatory factor analysis, emotional intelligence, item response theory
Plato crater, first observative session: not any "hook" but a shark fin? (Italian Title: La 1° Campagna Osservativa del cratere Plato: non un "uncino" ma una "pinna di squalo"?)

Science.gov (United States)

Mercatali, A.

2018-01-01

On 1st March 2012 an observative session of Moon's Plato crater was made. The purpose of these observations was to check the presence of one shadow with "hook" form at the inner of Plato crater already reported by H. Percy Wilkins, 3th April 21:30 UT, 1952. The results obtained by us have not shown any shadow with an hook form, but a shadow like a shark fin.

Democratic institutions: the spell of Plato and the return to the classics

OpenAIRE

Colen, José

2012-01-01

The purpose of this paper, that reflects an ongoing research, is to suggest the usefulness of an analysis of the readings of Karl Popper and Leo Strauss on Plato's political philosophy. Very different as they are, both thinkers saw in the Republic one of the most powerful critics of democracy and built interpretations and polemic arguments by contrast with Plato's arguments. There are currently two arguments questioning liberal or constitutional democracy. The first originates in the social s...
God as Intellect in the philosophical Theology of Plato, Aristotle and Plotinus.

Directory of Open Access Journals (Sweden)

Nadezhda Volkova

2017-12-01

Full Text Available The article gives an overview of the main stages in the development of philosophical theology in Plato, Aristotle and Plotinus, as well as its central concept - Active Intellect or God. It is shown, firstly, that Plato was the first who formulated the concept of a One omnibenevolent God. Plato opposed this doctrine to the gods of traditional mythology. In the "Timaeus" talking about the creation of the world, Plato represents God as an artisan, i. e. Demiurge, who arranges the World soul and matter with the help of the numbers. Therefore, God is introduced as an Intellect, because looking at an intelligible paradigm, he created the cosmos as its likeness. Secondly, it was shown that Aristotle made theology demonstrative theoretical knowledge. God as a subject of such knowledge is the pure actuality of thinking. Third, it is shown that Plotinus, continuing the line of Plato and Aristotle, gave philosophical theology a new, much more personal character. Theology for Plotinus is not only an demonstrative knowledge of the omnibenevolent God, but also a personal experience of reunion with him. A special attention in the article is paid for Plotinus' interpretation of the Platonic Demiurge. It is shown that Plotinus first connected the two aspects of the divine, namely the Demiurge-creator and the intelligible paradigm that are described in the "Timaeus," into the single hypostasis of Intellect. The main reason for this assertion was the necessity to postulate the unity of the intellect and the intelligible object as a necessary condition for the possibility of all cognitions. As a result, instead of the traditional idea of the two gods, Plotinus elaborates the doctrine of a single divine Intellect, combining both these aspects.
The role of the poet in Plato's ideal cities of Callipolis and Magnesia

Directory of Open Access Journals (Sweden)

Gerard Naddaf

2008-01-01

Full Text Available Plato's attitude toward the poets and poetry has always been a flashpoint of debate, controversy and notoriety, but most scholars have failed to see their central role in the ideal cities of the Republic and the Laws, that is, Callipolis and Magnesia. In this paper, I argue that in neither dialogue does Plato "exile" the poets, but, instead, believes they must, like all citizens, exercise the expertise proper to their profession, allowing them the right to become full-fledged participants in the productive class. Moreover, attention to certain details reveals that Plato harnesses both positive and negative factors in poetry to bring his ideal cities closer to a practical realization. The status of the poet and his craft in this context has rarely to my knowledge been addressed.
Modeling Local Item Dependence in Cloze and Reading Comprehension Test Items Using Testlet Response Theory

Science.gov (United States)

Baghaei, Purya; Ravand, Hamdollah

2016-01-01

In this study the magnitudes of local dependence generated by cloze test items and reading comprehension items were compared and their impact on parameter estimates and test precision was investigated. An advanced English as a foreign language reading comprehension test containing three reading passages and a cloze test was analyzed with a…
Item Response Theory Models for Performance Decline during Testing

Science.gov (United States)

Jin, Kuan-Yu; Wang, Wen-Chung

2014-01-01

Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…
Item response theory analysis of the mechanics baseline test

Science.gov (United States)

Cardamone, Caroline N.; Abbott, Jonathan E.; Rayyan, Saif; Seaton, Daniel T.; Pawl, Andrew; Pritchard, David E.

2012-02-01

Item response theory is useful in both the development and evaluation of assessments and in computing standardized measures of student performance. In item response theory, individual parameters (difficulty, discrimination) for each item or question are fit by item response models. These parameters provide a means for evaluating a test and offer a better measure of student skill than a raw test score, because each skill calculation considers not only the number of questions answered correctly, but the individual properties of all questions answered. Here, we present the results from an analysis of the Mechanics Baseline Test given at MIT during 2005-2010. Using the item parameters, we identify questions on the Mechanics Baseline Test that are not effective in discriminating between MIT students of different abilities. We show that a limited subset of the highest quality questions on the Mechanics Baseline Test returns accurate measures of student skill. We compare student skills as determined by item response theory to the more traditional measurement of the raw score and show that a comparable measure of learning gain can be computed.
Worldly and Otherworldly Virtue: Likeness to God as Educational Ideal in Plato, Plotinus, and Today

Science.gov (United States)

Zovko, Marie-Élise

2018-01-01

In Plato, 'Becoming like God' constitutes the "telos" of the philosophical life. Our 'likeness to God' is rooted in the relationship of the divine paradeigma to its image established in the generation of the Cosmos. This relationship makes knowledge and virtue possible, and informs Plato's theory of education. Related concepts preexist…
Plato: from Socrates to Pre-Socratics?

Directory of Open Access Journals (Sweden)

YU. SHICHALIN

2015-04-01

Full Text Available The traditional view on the chronology of the Corpus Platonicum (CP texts turns out to be anachronistic from several, especially formal and historical, points of view. From the formal point of view all the CP texts can be divided into speeches, framed dialogues and dialogues in dramatic form; there are serious reasons for correlating these groups of dialogues with diﬀerent chronological periods. Historically, to view Plato’s works as modern philosophical and scholarly literature is incorrect; instead, it would seem expedient to correlate the three groups of dialogues mentioned with their changing audiences for which Plato wrote before the establishment of the Academy, during the ﬁrst stage of its existence (before the second trip to Sicily and in the later period. The evolution of Plato’s philosophy is to be correlated with the evolution of the school created by him. Lack of attention to these methods can lead to incorrect assumptions concerning Plato’s evolution which found their way, among others, into the book “Plato and the Post-Socratic Dialogue” by Prof. Ch. Kahn where he speaks of a transition from Socratic to Pre-Socratic problems in Plato’s works; the book is critically examined by the author of the present article.
Evaluating the Psychometric Characteristics of Generated Multiple-Choice Test Items

Science.gov (United States)

Gierl, Mark J.; Lai, Hollis; Pugh, Debra; Touchie, Claire; Boulais, André-Philippe; De Champlain, André

2016-01-01

Item development is a time- and resource-intensive process. Automatic item generation integrates cognitive modeling with computer technology to systematically generate test items. To date, however, items generated using cognitive modeling procedures have received limited use in operational testing situations. As a result, the psychometric…
Procedures for Selecting Items for Computerized Adaptive Tests.

Science.gov (United States)

Kingsbury, G. Gage; Zara, Anthony R.

1989-01-01

Several classical approaches and alternative approaches to item selection for computerized adaptive testing (CAT) are reviewed and compared. The study also describes procedures for constrained CAT that may be added to classical item selection approaches to allow them to be used for applied testing. (TJH)
Can Prior Knowledge Hurt Text Comprehension? An Answer Borrowed from Plato, Aristotle, and Descartes.

Science.gov (United States)

Friedman, Lawrence B.

Taking a philosophical approach based on what Plato, Aristotle, and Descartes said about knowledge, this paper addresses some of the murkiness in the conceptual space surrounding the issue of whether prior knowledge does or does not facilitate text comprehension. Specifically, the paper first develops a non-exhaustive typology of cases in which…
A Comparison of the Approaches of Generalizability Theory and Item Response Theory in Estimating the Reliability of Test Scores for Testlet-Composed Tests

Science.gov (United States)

Lee, Guemin; Park, In-Yong

2012-01-01

Previous assessments of the reliability of test scores for testlet-composed tests have indicated that item-based estimation methods overestimate reliability. This study was designed to address issues related to the extent to which item-based estimation methods overestimate the reliability of test scores composed of testlets and to compare several…
Effect of Differential Item Functioning on Test Equating

Science.gov (United States)

Kabasakal, Kübra Atalay; Kelecioglu, Hülya

2015-01-01

This study examines the effect of differential item functioning (DIF) items on test equating through multilevel item response models (MIRMs) and traditional IRMs. The performances of three different equating models were investigated under 24 different simulation conditions, and the variables whose effects were examined included sample size, test…
The Role of Item Feedback in Self-Adapted Testing.

Science.gov (United States)

Roos, Linda L.; And Others

1997-01-01

The importance of item feedback in self-adapted testing was studied by comparing feedback and no feedback conditions for computerized adaptive tests and self-adapted tests taken by 363 college students. Results indicate that item feedback is not necessary to realize score differences between self-adapted and computerized adaptive testing. (SLD)
Australian Chemistry Test Item Bank: Years 11 & 12. Volume 1.

Science.gov (United States)

Commons, C., Ed.; Martin, P., Ed.

Volume 1 of the Australian Chemistry Test Item Bank, consisting of two volumes, contains nearly 2000 multiple-choice items related to the chemistry taught in Year 11 and Year 12 courses in Australia. Items which were written during 1979 and 1980 were initially published in the "ACER Chemistry Test Item Collection" and in the "ACER…
The role of the poet in Plato's ideal cities of Callipolis and Magnesia

OpenAIRE

Gerard Naddaf

2008-01-01

Plato's attitude toward the poets and poetry has always been a flashpoint of debate, controversy and notoriety, but most scholars have failed to see their central role in the ideal cities of the Republic and the Laws, that is, Callipolis and Magnesia. In this paper, I argue that in neither dialogue does Plato "exile" the poets, but, instead, believes they must, like all citizens, exercise the expertise proper to their profession, allowing them the right to become full-fledged participants in ...
Three Misunderstandings of Plato's Theory of Moral Education

Science.gov (United States)

Jonas, Mark E.

2016-01-01

In this essay, Mark Jonas argues that there are three broadly held misconceptions of Plato's philosophy that work against his relevance for contemporary moral education. The first is that he is an intellectualist who is concerned only with the cognitive aspect of moral development and does not sufficiently emphasize the affective and conative…
Using response-time constraints in item selection to control for differential speededness in computerized adaptive testing

NARCIS (Netherlands)

van der Linden, Willem J.; Scrams, David J.; Schnipke, Deborah L.

2003-01-01

This paper proposes an item selection algorithm that can be used to neutralize the effect of time limits in computer adaptive testing. The method is based on a statistical model for the response-time distributions of the test takers on the items in the pool that is updated each time a new item has
Plato's Critique of Rhetoric in the "Gorgias" (447a-466a): Epistemology, Methodology, and the Lyotardian Differend.

Science.gov (United States)

McComiskey, Bruce

The uncritical acceptance of Plato's treatment of sophistic doctrines (specifically in Plato's dialogue the "Gorgias") in the university has resulted in an impoverished contemporary view of sophistic rhetoric. Since Socrates' foundational epistemology allows for the knowledge of immutable truth and Gorgias' relativistic epistemology does…
Detection of differential item functioning using Lagrange multiplier tests

NARCIS (Netherlands)

Glas, Cornelis A.W.

1996-01-01

In this paper it is shown that differential item functioning can be evaluated using the Lagrange multiplier test or C. R. Rao's efficient score test. The test is presented in the framework of a number of item response theory (IRT) models such as the Rasch model, the one-parameter logistic model, the

1 Plato's Theories of Knowledge and Education: an Examination of ...

African Journals Online (AJOL)

Ngozi Ezenwa-Ohaeto

doing introduced some of the most insightful theories of knowledge that ..... major deductions he makes from that quotation (but does not explain), are: (i). That Plato 'assumes' that there is 'a pre- linguistic realm whose representation either.
ACER Chemistry Test Item Collection (ACER CHEMTIC Year 12 Supplement).

Science.gov (United States)

Australian Council for Educational Research, Hawthorn.

This publication contains 317 multiple-choice chemistry test items related to topics covered in the Victorian (Australia) Year 12 chemistry course. It allows teachers access to a range of items suitable for diagnostic and achievement purposes, supplementing the ACER Chemistry Test Item Collection--Year 12 (CHEMTIC). The topics covered are: organic…
Mixed-Format Test Score Equating: Effect of Item-Type Multidimensionality, Length and Composition of Common-Item Set, and Group Ability Difference

Science.gov (United States)

Wang, Wei

2013-01-01

Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are now widely used in many testing programs. Mixed-format tests often are considered to be superior to tests containing only MC items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under…
Branched Adaptive Testing with a Rasch-Model-Calibrated Test: Analysing Item Presentation's Sequence Effects Using the Rasch-Model-Based LLTM

Science.gov (United States)

Kubinger, Klaus D.; Reif, Manuel; Yanagida, Takuya

2011-01-01

Item position effects provoke serious problems within adaptive testing. This is because different testees are necessarily presented with the same item at different presentation positions, as a consequence of which comparing their ability parameter estimations in the case of such effects would not at all be fair. In this article, a specific…
The necessity of dialectics according to Plato and Adorno

DEFF Research Database (Denmark)

Olsen, Anne-Marie Eggert

2009-01-01

The paper deals with the notion of philosophy as, on the one hand, an academic or scientific discipline and, on the other, something perhaps superior to the disciplines and in any case dealing with what is not a 'disciplinary' matter. Through an interpretation of Plato's concept of dialectics and...
Pemikiran Epistemologi Barat: dari Plato Sampai Gonseth

OpenAIRE

Nunu Burhanuddin

2015-01-01

This paper riviewing the Western epistemology thought. The theme focuses on Plato to Gonseth. The Epistemology that referred in this article, is to think about "how humans acquire knowledge?". From this then appear four types of sect modern western epistemology thought, namely: sect of empiricism, rationalism sect, kantinian sect, sect of positivism. Furthermore, the social positivism sciences developed by Comte leaves serious problems associated with the loss of the role of the subject. This...
Purification through Emotions: The Role of Shame in Plato's "Sophist" 230B4-E5

Science.gov (United States)

Candiotto, Laura

2018-01-01

This article proposes an analysis of Plato's "Sophist" (230b4--e5) that underlines the bond between the logical and the emotional components of the Socratic "elenchus", with the aim of depicting the social valence of this philosophical practice. The use of emotions characterizing the 'elenctic' method described by Plato is…
Plato (power load analysis tool) - a module of west wall monitoring system

International Nuclear Information System (INIS)

Ranjan, Sutapa; Travere, Jean-marcel; Moreau, P.

2015-01-01

The mandate of the WEST (W Environment for Steady-state Tokamak) project, is to upgrade the medium- sized superconducting Tokamak, Tore Supra in a major scale. One of it's objectives, is to also act as a test-bed for ITER divertor components, to be procured and used in ITER. WEST would be installing actively cooled Tungsten divertor elements, like the ones to be used in ITER. These components would be tested under two experimental scenarios: high power (Ip = 0.8MA, lasting 30s with 15MW injected power) and high fluence (Ip = 0.6 MA, lasting 1000s with 12 MW injected power). Heat load on the divertor target will range from a few MW/m 2 up to 20 MW/m 2 depending on the X point location and the heat flux decay length. The tungsten Plasma Facing Components (PFCs) are less tolerant to overheating than their Carbon counterparts and prevention of their burnout is a major concern. It is in this context that the Wall Monitoring System (WMS) - a software framework aimed at monitoring the health of the Wall components, was conceived. WMS has been divided into three parts: a) a pre-discharge power load analysis tool to check compatibility between plasma scenario and PFC's operational limits in terms of heat flux b) a real-time system during discharge, to take into account all necessary measurements involved in the PFCs protection c) a set of analysis tools that would be used post-discharge, that would access WEST database and compare predicted and experimental results. This paper presents an overview of PLATo - the pre-pulse module of WMS that has been recently developed under IPR-IRFM research collaboration. PLAto has two major components - one that produces heat flux information of the PFCS and the other that produces energy graphs depending on shot profile defined by time variant magnetic equilibrium and injected power profiles. Preliminary results will be presented based on foreseen WEST plasma reference scenarios. (author)
A Bifactor Multidimensional Item Response Theory Model for Differential Item Functioning Analysis on Testlet-Based Items

Science.gov (United States)

Fukuhara, Hirotaka; Kamata, Akihito

2011-01-01

A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into…
Item response theory, computerized adaptive testing, and PROMIS: assessment of physical function.

Science.gov (United States)

Fries, James F; Witter, James; Rose, Matthias; Cella, David; Khanna, Dinesh; Morgan-DeWitt, Esi

2014-01-01

Patient-reported outcome (PRO) questionnaires record health information directly from research participants because observers may not accurately represent the patient perspective. Patient-reported Outcomes Measurement Information System (PROMIS) is a US National Institutes of Health cooperative group charged with bringing PRO to a new level of precision and standardization across diseases by item development and use of item response theory (IRT). With IRT methods, improved items are calibrated on an underlying concept to form an item bank for a "domain" such as physical function (PF). The most informative items can be combined to construct efficient "instruments" such as 10-item or 20-item PF static forms. Each item is calibrated on the basis of the probability that a given person will respond at a given level, and the ability of the item to discriminate people from one another. Tailored forms may cover any desired level of the domain being measured. Computerized adaptive testing (CAT) selects the best items to sharpen the estimate of a person's functional ability, based on prior responses to earlier questions. PROMIS item banks have been improved with experience from several thousand items, and are calibrated on over 21,000 respondents. In areas tested to date, PROMIS PF instruments are superior or equal to Health Assessment Questionnaire and Medical Outcome Study Short Form-36 Survey legacy instruments in clarity, translatability, patient importance, reliability, and sensitivity to change. Precise measures, such as PROMIS, efficiently incorporate patient self-report of health into research, potentially reducing research cost by lowering sample size requirements. The advent of routine IRT applications has the potential to transform PRO measurement.
Australian Chemistry Test Item Bank: Years 11 and 12. Volume 2.

Science.gov (United States)

Commons, C., Ed.; Martin, P., Ed.

The second volume of the Australian Chemistry Test Item Bank, consisting of two volumes, contains nearly 2000 multiple-choice items related to the chemistry taught in Year 11 and Year 12 courses in Australia. Items which were written during 1979 and 1980 were initially published in the "ACER Chemistry Test Item Collection" and in the…
Effects of Reducing the Cognitive Load of Mathematics Test Items on Student Performance

Directory of Open Access Journals (Sweden)

Susan C. Gillmor

2015-01-01

Full Text Available This study explores a new item-writing framework for improving the validity of math assessment items. The authors transfer insights from Cognitive Load Theory (CLT, traditionally used in instructional design, to educational measurement. Fifteen, multiple-choice math assessment items were modified using research-based strategies for reducing extraneous cognitive load. An experimental design with 222 middle-school students tested the effects of the reduced cognitive load items on student performance and anxiety. Significant findings confirm the main research hypothesis that reducing the cognitive load of math assessment items improves student performance. Three load-reducing item modifications are identified as particularly effective for reducing item difficulty: signalling important information, aesthetic item organization, and removing extraneous content. Load reduction was not shown to impact student anxiety. Implications for classroom assessment and future research are discussed.
Overcoming the effects of differential skewness of test items in scale construction

Directory of Open Access Journals (Sweden)

Johann M. Schepers

2004-10-01

Full Text Available The principal objective of the study was to develop a procedure for overcoming the effects of differential skewness of test items in scale construction. It was shown that the degree of skewness of test items places an upper limit on the correlations between the items, regardless of the contents of the items. If the items are ordered in terms of skewness the resulting inter correlation matrix forms a simplex or a pseudo simplex. Factoring such a matrix results in a multiplicity of factors, most of which are artifacts. A procedure for overcoming this problem was demonstrated with items from the Locus of Control Inventory (Schepers, 1995. The analysis was based on a sample of 1662 first year university students. Opsomming Die hoofdoel van die studie was om ’n prosedure te ontwikkel om die gevolge van differensiële skeefheid van toetsitems, in skaalkonstruksie, teen te werk. Daar is getoon dat die graad van skeefheid van toetsitems ’n boonste grens plaas op die korrelasies tussen die items ongeag die inhoud daarvan. Indien die items gerangskik word volgens graad van skeefheid, sal die interkorrelasiematriks van die items ’n simpleks of pseudosimpleks vorm. Indien so ’n matriks aan faktorontleding onderwerp word, lei dit tot ’n veelheid van faktore waarvan die meerderheid artefakte is. ’n Prosedure om hierdie probleem te bowe te kom, is gedemonstreer met behulp van die items van die Lokus van Beheer-vraelys (Schepers, 1995. Die ontledings is op ’n steekproef van 1662 eerstejaaruniversiteitstudente gebaseer.
Secondary Psychometric Examination of the Dimensional Obsessive-Compulsive Scale: Classical Testing, Item Response Theory, and Differential Item Functioning.

Science.gov (United States)

Thibodeau, Michel A; Leonard, Rachel C; Abramowitz, Jonathan S; Riemann, Bradley C

2015-12-01

The Dimensional Obsessive-Compulsive Scale (DOCS) is a promising measure of obsessive-compulsive disorder (OCD) symptoms but has received minimal psychometric attention. We evaluated the utility and reliability of DOCS scores. The study included 832 students and 300 patients with OCD. Confirmatory factor analysis supported the originally proposed four-factor structure. DOCS total and subscale scores exhibited good to excellent internal consistency in both samples (α = .82 to α = .96). Patient DOCS total scores reduced substantially during treatment (t = 16.01, d = 1.02). DOCS total scores discriminated between students and patients (sensitivity = 0.76, 1 - specificity = 0.23). The measure did not exhibit gender-based differential item functioning as tested by Mantel-Haenszel chi-square tests. Expected response options for each item were plotted as a function of item response theory and demonstrated that DOCS scores incrementally discriminate OCD symptoms ranging from low to extremely high severity. Incremental differences in DOCS scores appear to represent unbiased and reliable differences in true OCD symptom severity. © The Author(s) 2014.
Development of a Mechanical Engineering Test Item Bank to promote learning outcomes-based education in Japanese and Indonesian higher education institutions

Directory of Open Access Journals (Sweden)

Jeffrey S. Cross

2017-11-01

Full Text Available Following on the 2008-2012 OECD Assessment of Higher Education Learning Outcomes (AHELO feasibility study of civil engineering, in Japan a mechanical engineering learning outcomes assessment working group was established within the National Institute of Education Research (NIER, which became the Tuning National Center for Japan. The purpose of the project is to develop among engineering faculty members, common understandings of engineering learning outcomes, through the collaborative process of test item development, scoring, and sharing of results. By substantiating abstract level learning outcomes into concrete level learning outcomes that are attainable and assessable, and through measuring and comparing the students’ achievement of learning outcomes, it is anticipated that faculty members will be able to draw practical implications for educational improvement at the program and course levels. The development of a mechanical engineering test item bank began with test item development workshops, which led to a series of trial tests, and then to a large scale test implementation in 2016 of 348 first semester master’s students in 9 institutions in Japan, using both multiple choice questions designed to measure the mastery of basic and engineering sciences, and a constructive response task designed to measure “how well students can think like an engineer.” The same set of test items were translated from Japanese into to English and Indonesian, and used to measure achievement of learning outcomes at Indonesia’s Institut Teknologi Bandung (ITB on 37 rising fourth year undergraduate students. This paper highlights how learning outcomes assessment can effectively facilitate learning outcomes-based education, by documenting the experience of Japanese and Indonesian mechanical engineering faculty members engaged in the NIER Test Item Bank project.First published online: 30 November 2017
Item calibration in incomplete testing designs

Directory of Open Access Journals (Sweden)

Norman D. Verhelst

2011-01-01

Full Text Available This study discusses the justifiability of item parameter estimation in incomplete testing designs in item response theory. Marginal maximum likelihood (MML as well as conditional maximum likelihood (CML procedures are considered in three commonly used incomplete designs: random incomplete, multistage testing and targeted testing designs. Mislevy and Sheenan (1989 have shown that in incomplete designs the justifiability of MML can be deduced from Rubin's (1976 general theory on inference in the presence of missing data. Their results are recapitulated and extended for more situations. In this study it is shown that for CML estimation the justification must be established in an alternative way, by considering the neglected part of the complete likelihood. The problems with incomplete designs are not generally recognized in practical situations. This is due to the stochastic nature of the incomplete designs which is not taken into account in standard computer algorithms. For that reason, incorrect uses of standard MML- and CML-algorithms are discussed.
Mathematical-programming approaches to test item pool design

NARCIS (Netherlands)

Veldkamp, Bernard P.; van der Linden, Willem J.; Ariel, A.

2002-01-01

This paper presents an approach to item pool design that has the potential to improve on the quality of current item pools in educational and psychological testing andhence to increase both measurement precision and validity. The approach consists of the application of mathematical programming
Uncertainties in the Item Parameter Estimates and Robust Automated Test Assembly

Science.gov (United States)

Veldkamp, Bernard P.; Matteucci, Mariagiulia; de Jong, Martijn G.

2013-01-01

Item response theory parameters have to be estimated, and because of the estimation process, they do have uncertainty in them. In most large-scale testing programs, the parameters are stored in item banks, and automated test assembly algorithms are applied to assemble operational test forms. These algorithms treat item parameters as fixed values,…
Socrates the Pythagorean: an Invention of Plato?

Directory of Open Access Journals (Sweden)

Yury Shichalin

2012-06-01

Full Text Available This article discusses the image of Socrates as found in the Works of Plato and Aristophanes. The author discovers Pythagorean traits in the image of Socrates as portrayed by these two ancient authors. The author also discusses the Pythagoreans and their role in the creation of stable schools of Philosophy. He likewise shows that the sophists were not the only ones contributing to the creation of centres of education and learning in the ancient world
Ancient Doctrines of Passions: Plato and Aristotle

Directory of Open Access Journals (Sweden)

Iskra-Paczkowska Agnieszka

2016-09-01

Full Text Available The subject of this essay is a discussion of the doctrines of emotions of Plato and Aristotle. According to both them it is impossible to oust the passions from the good, i.e. happy life. On the contrary, emotions are an important component of human excellence. We investigate this question with reference to Plato’s doctrine of the soul and his concept of a perfect life, and Aristotle’s ethics, poetics and rhetoric.

The "Sniffin' Kids" test--a 14-item odor identification test for children.

Directory of Open Access Journals (Sweden)

Valentin A Schriever

Full Text Available Tools for measuring olfactory function in adults have been well established. Although studies have shown that olfactory impairment in children may occur as a consequence of a number of diseases or head trauma, until today no consensus on how to evaluate the sense of smell in children exists in Europe. Aim of the study was to develop a modified "Sniffin' Sticks" odor identification test, the "Sniffin' Kids" test for the use in children. In this study 537 children between 6-17 years of age were included. Fourteen odors, which were identified at a high rate by children, were selected from the "Sniffin' Sticks" 16-item odor identification test. Normative date for the 14-item "Sniffin' Kids" odor identification test was obtained. The test was validated by including a group of congenital anosmic children. Results show that the "Sniffin' Kids" test is able to discriminate between normosmia and anosmia with a cutoff value of >7 points on the odor identification test. In addition the test-retest reliability was investigated in a group of 31 healthy children and shown to be ρ = 0.44. With the 14-item odor identification "Sniffin' Kids" test we present a valid and reliable test for measuring olfactory function in children between ages 6-17 years.
[History and reception of the translations of Plato's Dialogues by Antoni Bronikowski].

Science.gov (United States)

Mróz, Tomasz

2014-01-01

The article presents the history of translations of Plato's dialogues as made by A. Bronikowski (1817-1884), their assessment formulated by the contemporary for the translator recipients and today's opinions on them. Bronikowski began his translation work on the legacy of Plato in the '50s of the 19th century and carried them out systematically, despite the many adversities, until his death. The article presents the most important criticisms of the reviewers of Bronikowski's translations, which focused on the flaws of his style. The critics pointed out numerous shortcomings, archaisms, which hindered and prevented smooth reading of the text by readers unfamiliar with the language of the original. Most of the criticisms came from the Warsaw environment, especially from K. Kozłowski, the son of the first Polish translator of Plato, FA. Kozłowski. Among the defenders of Bronikowski there were K. Libelt and J.I. Kraszewski. They raised the subject of difficulty which the translator had to deal with and the lack of literary taste of the audience. It seems that both parties were partially right. Bronikowski's text was indeed not suitable for smooth reading in many places, however, it could serve as a useful tool for students who acquainted themselves with the Greek originals of the dialogues.
Applying Item Response Theory methods to design a learning progression-based science assessment

Science.gov (United States)

Chen, Jing

Learning progressions are used to describe how students' understanding of a topic progresses over time and to classify the progress of students into steps or levels. This study applies Item Response Theory (IRT) based methods to investigate how to design learning progression-based science assessments. The research questions of this study are: (1) how to use items in different formats to classify students into levels on the learning progression, (2) how to design a test to give good information about students' progress through the learning progression of a particular construct and (3) what characteristics of test items support their use for assessing students' levels. Data used for this study were collected from 1500 elementary and secondary school students during 2009--2010. The written assessment was developed in several formats such as the Constructed Response (CR) items, Ordered Multiple Choice (OMC) and Multiple True or False (MTF) items. The followings are the main findings from this study. The OMC, MTF and CR items might measure different components of the construct. A single construct explained most of the variance in students' performances. However, additional dimensions in terms of item format can explain certain amount of the variance in student performance. So additional dimensions need to be considered when we want to capture the differences in students' performances on different types of items targeting the understanding of the same underlying progression. Items in each item format need to be improved in certain ways to classify students more accurately into the learning progression levels. This study establishes some general steps that can be followed to design other learning progression-based tests as well. For example, first, the boundaries between levels on the IRT scale can be defined by using the means of the item thresholds across a set of good items. Second, items in multiple formats can be selected to achieve the information criterion at all
A comparison of discriminant logistic regression and Item Response Theory Likelihood-Ratio Tests for Differential Item Functioning (IRTLRDIF) in polytomous short tests.

Science.gov (United States)

Hidalgo, María D; López-Martínez, María D; Gómez-Benito, Juana; Guilera, Georgina

2016-01-01

Short scales are typically used in the social, behavioural and health sciences. This is relevant since test length can influence whether items showing DIF are correctly flagged. This paper compares the relative effectiveness of discriminant logistic regression (DLR) and IRTLRDIF for detecting DIF in polytomous short tests. A simulation study was designed. Test length, sample size, DIF amount and item response categories number were manipulated. Type I error and power were evaluated. IRTLRDIF and DLR yielded Type I error rates close to nominal level in no-DIF conditions. Under DIF conditions, Type I error rates were affected by test length DIF amount, degree of test contamination, sample size and number of item response categories. DLR showed a higher Type I error rate than did IRTLRDIF. Power rates were affected by DIF amount and sample size, but not by test length. DLR achieved higher power rates than did IRTLRDIF in very short tests, although the high Type I error rate involved means that this result cannot be taken into account. Test length had an important impact on the Type I error rate. IRTLRDIF and DLR showed a low power rate in short tests and with small sample sizes.
Plato, Freud and Marx on Human Nature: A Comparative Analysis ...

African Journals Online (AJOL)

This paper examines the conceptions of human nature by Plato, Sigmund Freud and Karl Marx, with a view to revealing and explaining the convergence and divergence between these conceptions. It shows that agreement or disagreement on the distinguishing characteristics of human individuals can be situated on ...
Analysis test of understanding of vectors with the three-parameter logistic model of item response theory and item response curves technique

Science.gov (United States)

Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

2016-12-01

This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.
Plato and Play: Taking Education Seriously in Ancient Greece

Science.gov (United States)

D'Angour, Armand

2013-01-01

In this article, the author outlines Plato's notions of play in ancient Greek culture and shows how the philosopher's views on play can be best appreciated against the background of shifting meanings and evaluations of play in classical Greece. Play--in various forms such as word play, ritual, and music--proved central to the development of…
Item Difficulty in the Evaluation of Computer-Based Instruction: An Example from Neuroanatomy

Science.gov (United States)

Chariker, Julia H.; Naaz, Farah; Pani, John R.

2012-01-01

This article reports large item effects in a study of computer-based learning of neuroanatomy. Outcome measures of the efficiency of learning, transfer of learning, and generalization of knowledge diverged by a wide margin across test items, with certain sets of items emerging as particularly difficult to master. In addition, the outcomes of…
An Effect Size Measure for Raju's Differential Functioning for Items and Tests

Science.gov (United States)

Wright, Keith D.; Oshima, T. C.

2015-01-01

This study established an effect size measure for differential functioning for items and tests' noncompensatory differential item functioning (NCDIF). The Mantel-Haenszel parameter served as the benchmark for developing NCDIF's effect size measure for reporting moderate and large differential item functioning in test items. The effect size of…
Dosimetric evaluation of PLATO and Oncentra treatment planning systems for High Dose Rate (HDR) brachytherapy gynecological treatments

Energy Technology Data Exchange (ETDEWEB)

Singh, Hardev; De La Fuente Herman, Tania; Showalter, Barry; Thompson, Spencer J.; Syzek, Elizabeth J.; Herman, Terence; Ahmad, Salahuddin [Department of Radiation Oncology, Peggy and Charles Stephenson Oklahoma Cancer Center, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104 (United States)

2012-10-23

This study compares the dosimetric differences in HDR brachytherapy treatment plans calculated with Nucletron's PLATO and Oncentra MasterPlan treatment planning systems (TPS). Ten patients (1 T1b, 1 T2a, 6 T2b, 2 T4) having cervical carcinoma, median age of 43.5 years (range, 34-79 years) treated with tandem and ring applicator in our institution were selected retrospectively for this study. For both Plato and Oncentra TPS, the same orthogonal films anterior-posterior (AP) and lateral were used to manually draw the prescription and anatomical points using definitions from the Manchester system and recommendations from the ICRU report 38. Data input for PLATO was done using a digitizer and Epson Expression 10000XL scanner was used for Oncentra where the points were selected on the images in the screen. The prescription doses for these patients were 30 Gy to points right A (RA) and left A (LA) delivered in 5 fractions with Ir-192 HDR source. Two arrangements: one dwell position and two dwell positions on the tandem were used for dose calculation. The doses to the patient points right B (RB) and left B (LB), and to the organs at risk (OAR), bladder and rectum for each patient were calculated. The mean dose and the mean percentage difference in dose calculated by the two treatment planning systems were compared. Paired t-tests were used for statistical analysis. No significant differences in mean RB, LB, bladder and rectum doses were found with p-values > 0.14. The mean percent difference of doses in RB, LB, bladder and rectum are found to be less than 2.2%, 1.8%, 1.3% and 2.2%, respectively. Dose calculations based on the two different treatment planning systems were found to be consistent and the treatment plans can be made with either system in our department without any concern.
Analysis test of understanding of vectors with the three-parameter logistic model of item response theory and item response curves technique

Directory of Open Access Journals (Sweden)

Suttida Rakkapao

2016-10-01

Full Text Available This study investigated the multiple-choice test of understanding of vectors (TUV, by applying item response theory (IRT. The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test’s distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.
Item-focussed Trees for the Identification of Items in Differential Item Functioning.

Science.gov (United States)

Tutz, Gerhard; Berger, Moritz

2016-09-01

A novel method for the identification of differential item functioning (DIF) by means of recursive partitioning techniques is proposed. We assume an extension of the Rasch model that allows for DIF being induced by an arbitrary number of covariates for each item. Recursive partitioning on the item level results in one tree for each item and leads to simultaneous selection of items and variables that induce DIF. For each item, it is possible to detect groups of subjects with different item difficulties, defined by combinations of characteristics that are not pre-specified. The way a DIF item is determined by covariates is visualized in a small tree and therefore easily accessible. An algorithm is proposed that is based on permutation tests. Various simulation studies, including the comparison with traditional approaches to identify items with DIF, show the applicability and the competitive performance of the method. Two applications illustrate the usefulness and the advantages of the new method.
Conditioning factors of test-taking engagement in PIAAC: an exploratory IRT modelling approach considering person and item characteristics

Directory of Open Access Journals (Sweden)

Frank Goldhammer

2017-11-01

Full Text Available Abstract Background A potential problem of low-stakes large-scale assessments such as the Programme for the International Assessment of Adult Competencies (PIAAC is low test-taking engagement. The present study pursued two goals in order to better understand conditioning factors of test-taking disengagement: First, a model-based approach was used to investigate whether item indicators of disengagement constitute a continuous latent person variable by domain. Second, the effects of person and item characteristics were jointly tested using explanatory item response models. Methods Analyses were based on the Canadian sample of Round 1 of the PIAAC, with N = 26,683 participants completing test items in the domains of literacy, numeracy, and problem solving. Binary item disengagement indicators were created by means of item response time thresholds. Results The results showed that disengagement indicators define a latent dimension by domain. Disengagement increased with lower educational attainment, lower cognitive skills, and when the test language was not the participant’s native language. Gender did not exert any effect on disengagement, while age had a positive effect for problem solving only. An item’s location in the second of two assessment modules was positively related to disengagement, as was item difficulty. The latter effect was negatively moderated by cognitive skill, suggesting that poor test-takers are especially likely to disengage with more difficult items. Conclusions The negative effect of cognitive skill, the positive effect of item difficulty, and their negative interaction effect support the assumption that disengagement is the outcome of individual expectations about success (informed disengagement.
A review of the effects on IRT item parameter estimates with a focus on misbehaving common items in test equating

Directory of Open Access Journals (Sweden)

Michalis P Michaelides

2010-10-01

Full Text Available Many studies have investigated the topic of change or drift in item parameter estimates in the context of Item Response Theory. Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items.
A Review of the Effects on IRT Item Parameter Estimates with a Focus on Misbehaving Common Items in Test Equating.

Science.gov (United States)

Michaelides, Michalis P

2010-01-01

Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items.
A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs

Directory of Open Access Journals (Sweden)

Eren Halil ÖZBERK

2017-03-01

Full Text Available In contrast with the previous studies, this study employed various test designs (simple and complex which allow the evaluation of the overall ability score estimations across multiple real test conditions. In this study, four factors were manipulated, namely the test design, number of items per dimension, correlation between dimensions and item selection methods. Using the generated item and ability parameters, dichotomous item responses were generated in by using M3PL compensatory multidimensional IRT model with specified correlations. MCAT composite ability score accuracy was evaluated using absolute bias (ABSBIAS, correlation and the root mean square error (RMSE between true and estimated ability scores. The results suggest that the multidimensional test structure, number of item per dimension and correlation between dimensions had significant effect on item selection methods for the overall score estimations. For simple structure test design it was found that V1 item selection has the lowest absolute bias estimations for both long and short tests while estimating overall scores. As the model gets complex KL item selection method performed better than other two item selection method.
Prediction of true test scores from observed item scores and ancillary data.

Science.gov (United States)

Haberman, Shelby J; Yao, Lili; Sinharay, Sandip

2015-05-01

In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE(®) General Analytical Writing and until 2009 in the case of TOEFL(®) iBT Writing. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e-rater(®). In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE General Analytical Writing and TOEFL iBT Writing. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability. © 2015 The British Psychological Society.
Item response theory analysis of the life orientation test-revised: age and gender differential item functioning analyses.

Science.gov (United States)

Steca, Patrizia; Monzani, Dario; Greco, Andrea; Chiesi, Francesca; Primi, Caterina

2015-06-01

This study is aimed at testing the measurement properties of the Life Orientation Test-Revised (LOT-R) for the assessment of dispositional optimism by employing item response theory (IRT) analyses. The LOT-R was administered to a large sample of 2,862 Italian adults. First, confirmatory factor analyses demonstrated the theoretical conceptualization of the construct measured by the LOT-R as a single bipolar dimension. Subsequently, IRT analyses for polytomous, ordered response category data were applied to investigate the items' properties. The equivalence of the items across gender and age was assessed by analyzing differential item functioning. Discrimination and severity parameters indicated that all items were able to distinguish people with different levels of optimism and adequately covered the spectrum of the latent trait. Additionally, the LOT-R appears to be gender invariant and, with minor exceptions, age invariant. Results provided evidence that the LOT-R is a reliable and valid measure of dispositional optimism. © The Author(s) 2014.
Go Tell Alcibiades: Tragedy, Comedy, and Rhetoric in Plato's "Symposium"

Science.gov (United States)

Crick, Nathan; Poulakos, John

2008-01-01

Plato's "Symposium" is a significant but neglected part of his elaborate and complex attitude toward rhetoric. Unlike the intellectual discussion of the "Gorgias" or the unscripted conversation of the "Phaedrus," the "Symposium" stages a feast celebrating and driven by the forces of "Eros." A luxuriously stylish performance rather than a rational…
A comparison of item response models for accuracy and speed of item responses with applications to adaptive testing.

Science.gov (United States)

van Rijn, Peter W; Ali, Usama S

2017-05-01

We compare three modelling frameworks for accuracy and speed of item responses in the context of adaptive testing. The first framework is based on modelling scores that result from a scoring rule that incorporates both accuracy and speed. The second framework is the hierarchical modelling approach developed by van der Linden (2007, Psychometrika, 72, 287) in which a regular item response model is specified for accuracy and a log-normal model for speed. The third framework is the diffusion framework in which the response is assumed to be the result of a Wiener process. Although the three frameworks differ in the relation between accuracy and speed, one commonality is that the marginal model for accuracy can be simplified to the two-parameter logistic model. We discuss both conditional and marginal estimation of model parameters. Models from all three frameworks were fitted to data from a mathematics and spelling test. Furthermore, we applied a linear and adaptive testing mode to the data off-line in order to determine differences between modelling frameworks. It was found that a model from the scoring rule framework outperformed a hierarchical model in terms of model-based reliability, but the results were mixed with respect to correlations with external measures. © 2017 The British Psychological Society.

Which Statistic Should Be Used to Detect Item Preknowledge When the Set of Compromised Items Is Known?

Science.gov (United States)

Sinharay, Sandip

2017-09-01

Benefiting from item preknowledge is a major type of fraudulent behavior during educational assessments. Belov suggested the posterior shift statistic for detection of item preknowledge and showed its performance to be better on average than that of seven other statistics for detection of item preknowledge for a known set of compromised items. Sinharay suggested a statistic based on the likelihood ratio test for detection of item preknowledge; the advantage of the statistic is that its null distribution is known. Results from simulated and real data and adaptive and nonadaptive tests are used to demonstrate that the Type I error rate and power of the statistic based on the likelihood ratio test are very similar to those of the posterior shift statistic. Thus, the statistic based on the likelihood ratio test appears promising in detecting item preknowledge when the set of compromised items is known.
Location Indices for Ordinal Polytomous Items Based on Item Response Theory. Research Report. ETS RR-15-20

Science.gov (United States)

Ali, Usama S.; Chang, Hua-Hua; Anderson, Carolyn J.

2015-01-01

Polytomous items are typically described by multiple category-related parameters; situations, however, arise in which a single index is needed to describe an item's location along a latent trait continuum. Situations in which a single index would be needed include item selection in computerized adaptive testing or test assembly. Therefore single…
On-ground and in-orbit characterisation plan for the PLATO CCD normal cameras

Science.gov (United States)

Gow, J. P. D.; Walton, D.; Smith, A.; Hailey, M.; Curry, P.; Kennedy, T.

2017-11-01

PLAnetary Transits and Ocillations (PLATO) is the third European Space Agency (ESA) medium class mission in ESA's cosmic vision programme due for launch in 2026. PLATO will carry out high precision un-interrupted photometric monitoring in the visible band of large samples of bright solar-type stars. The primary mission goal is to detect and characterise terrestrial exoplanets and their systems with emphasis on planets orbiting in the habitable zone, this will be achieved using light curves to detect planetary transits. PLATO uses a novel multi- instrument concept consisting of 26 small wide field cameras The 26 cameras are made up of a telescope optical unit, four Teledyne e2v CCD270s mounted on a focal plane array and connected to a set of Front End Electronics (FEE) which provide CCD control and readout. There are 2 fast cameras with high read-out cadence (2.5 s) for magnitude ~ 4-8 stars, being developed by the German Aerospace Centre and 24 normal (N) cameras with a cadence of 25 s to monitor stars with a magnitude greater than 8. The N-FEEs are being developed at University College London's Mullard Space Science Laboratory (MSSL) and will be characterised along with the associated CCDs. The CCDs and N-FEEs will undergo rigorous on-ground characterisation and the performance of the CCDs will continue to be monitored in-orbit. This paper discusses the initial development of the experimental arrangement, test procedures and current status of the N-FEE. The parameters explored will include gain, quantum efficiency, pixel response non-uniformity, dark current and Charge Transfer Inefficiency (CTI). The current in-orbit characterisation plan is also discussed which will enable the performance of the CCDs and their associated N-FEE to be monitored during the mission, this will include measurements of CTI giving an indication of the impact of radiation damage in the CCDs.
What Do You Think You Are Measuring? A Mixed-Methods Procedure for Assessing the Content Validity of Test Items and Theory-Based Scaling

Science.gov (United States)

Koller, Ingrid; Levenson, Michael R.; Glück, Judith

2017-01-01

The valid measurement of latent constructs is crucial for psychological research. Here, we present a mixed-methods procedure for improving the precision of construct definitions, determining the content validity of items, evaluating the representativeness of items for the target construct, generating test items, and analyzing items on a theoretical basis. To illustrate the mixed-methods content-scaling-structure (CSS) procedure, we analyze the Adult Self-Transcendence Inventory, a self-report measure of wisdom (ASTI, Levenson et al., 2005). A content-validity analysis of the ASTI items was used as the basis of psychometric analyses using multidimensional item response models (N = 1215). We found that the new procedure produced important suggestions concerning five subdimensions of the ASTI that were not identifiable using exploratory methods. The study shows that the application of the suggested procedure leads to a deeper understanding of latent constructs. It also demonstrates the advantages of theory-based item analysis. PMID:28270777
Good and Bad: Love and Intimacy From Plato to Melanie Klein.

Science.gov (United States)

Stromberg, David

2018-06-01

Melanie Klein's theories on love outline a complex system of relations-an oscillating dynamic of psychical and emotional tendencies following from both actual experience and fantasies produced by the mind. Her insights are often discussed and applied in psychoanalytical contexts, but the philosophical implications of her theory-especially in relation to Platonic thought-have rarely been discussed. In this article, I will attempt to address this gap by setting out some preliminary yet core considerations shared by both Plato and Klein. First, I will describe some structural parallels between Kleinian and Platonic thought, especially in dialectical terms. Second, I will outline Plato's covert influence on Freud as passing through the teachings of philosopher Franz Brentano. And last, I will discuss intimacy as a struggle between the forces of good and bad, creativity and destruction, and love and hate-suggesting that Klein's conception of love emerges as a moral exigency.
Detection of person misfit in computerized adaptive tests with polytomous items

NARCIS (Netherlands)

van Krimpen-Stoop, Edith; Meijer, R.R.

2000-01-01

Item scores that do not fit an assumed item response theory model may cause the latent trait value to be estimated inaccurately. For computerized adaptive tests (CAT) with dichotomous items, several person-fit statistics for detecting nonfitting item score patterns have been proposed. Both for
The micro-fascism of Plato's good citizen: producing (dis)order through the construction of risk.

Science.gov (United States)

O'Byrne, Patrick; Holmes, Dave

2007-04-01

The human body has come to be seen as forever susceptible to both external and internal hazards, which in many circumstances require immediate, heroic, and expensive intervention. In response to this, there has been a shift from a treatment-based healthcare model to one of prevention wherein nurses play an integral role by identifying and assessing risks for individuals, communities, and populations. This paper uses Deborah Lupton's outline of the spectrum of risk and applies the theoretical works of Foucault and Plato to demonstrate the means by which nurses maintain social order by identifying and counselling risk takers. It also utilizes the work of Deleuze and Guattari to illustrate how Plato's framework for creating social order through the creation of the good citizen can be viewed as a micro-fascist system, which has been adopted wholeheartedly by preventative health professionals. The goal of this paper is to present an alternate understanding of risk to provide nurses and other healthcare professionals with a non-traditional appreciation of certain aspects of their practice as researchers and clinicians.
Plato and Aristotle on the Problem of Quality

OpenAIRE

Santa Cruz, María Isabel

2013-01-01

This paper purports toshow that it is not necessary to read the early Platonic dialogues starting from the "classic" theory of Forms. It argues, instead, that it is possibleto analyze them and, above all, to explain the use of the vocabulary of "presence" starting from the more general and prior possibility of distinguishing a subject from its accidental predicates, especially quality. The relation of "present in" or "being in" to which Plato recurs. is inherited by Aristotle. The distinction...
Approach to the problem of motion in Plato

Directory of Open Access Journals (Sweden)

Ignacio García Peña

2013-07-01

Full Text Available Since the first philosophers began to reflect about the idea of nature, the problem of motion became a crucial topic in their discussions. The entire pre-Socratic tradition was gathered by Plato, whose reflections are often triggered by fragments of Parmenides and Heraclitus. The Athenian philosopher analyzed motion in relation to the visible and intelligible regions that he distinguishes in the sphere of reality, as well as the fine line that links it to the soul
An Item Response Theory-Based, Computerized Adaptive Testing Version of the MacArthur-Bates Communicative Development Inventory: Words & Sentences (CDI:WS)

Science.gov (United States)

Makransky, Guido; Dale, Philip S.; Havmose, Philip; Bleses, Dorthe

2016-01-01

Purpose: This study investigated the feasibility and potential validity of an item response theory (IRT)-based computerized adaptive testing (CAT) version of the MacArthur-Bates Communicative Development Inventory: Words & Sentences (CDI:WS; Fenson et al., 2007) vocabulary checklist, with the objective of reducing length while maintaining…
Simulation results for PLATO: a prototype hybrid X-ray photon counting detector with a low energy threshold for fusion plasma diagnostics

International Nuclear Information System (INIS)

Habib, A.; Menouni, M.; Pangaud, P.; Morel, C.; Fenzi, C.; Colledani, G.; Moureau, G.; Escarguel, A.

2017-01-01

PLATO is a prototype hybrid X-ray photon counting detector that has been designed to meet the specifications for plasma diagnostics for the WEST tokamak platform (Tungsten (W) Environment in Steady-state Tokamak) in southern France, with potential perspectives for ITER. PLATO represents a customized solution that fulfills high sensitivity, low dispersion and high photon counting rate. The PLATO prototype matrix is composed of 16 × 18 pixels with a 70 μm pixel pitch. New techniques have been used in analog sensitive blocks to minimize noise coupling through supply rails and substrate, and to suppress threshold dispersion across the matrix. The PLATO ASIC is designed in CMOS 0.13 μm technology and was submitted for a fabrication run in June 2016. The chip is designed to be bump-bonded to a silicon sensor. This paper presents pixel architecture as well as simulation results while highlighting novel solutions.
Testlet-based Multidimensional Adaptive Testing

Directory of Open Access Journals (Sweden)

Andreas Frey

2016-11-01

Full Text Available Multidimensional adaptive testing (MAT is a highly efficient method for the simultaneous measurement of several latent traits. Currently, no psychometrically sound approach is available for the use of MAT in testlet-based tests. Testlets are sets of items sharing a common stimulus such as a graph or a text. They are frequently used in large operational testing programs like TOEFL, PISA, PIRLS, or NAEP. To make MAT accessible for such testing programs, we present a novel combination of MAT with a multidimensional generalization of the random effects testlet model (MAT-MTIRT. MAT-MTIRT compared to non-adaptive testing is examined for several combinations of testlet effect variances (0.0, 0.5, 1.0, 1.5 and testlet sizes (3 items, 6 items, 9 items with a simulation study considering three ability dimensions with simple loading structure. MAT-MTIRT outperformed non-adaptive testing regarding the measurement precision of the ability estimates. Further, the measurement precision decreased when testlet effect variances and testlet sizes increased. The suggested combination of the MTIRT model therefore provides a solution to the substantial problems of testlet-based tests while keeping the length of the test within an acceptable range.
Three Aspects of PLATO Use at Chanute AFB: CBE Production Techniques, Computer-Aided Management, Formative Development of CBE Lessons.

Science.gov (United States)

Klecka, Joseph A.

This report describes various aspects of lesson production and use of the PLATO system at Chanute Air Force Base. The first chapter considers four major factors influencing lesson production: (1) implementation of the "lean approach," (2) the Instructional Systems Development (ISD) role in lesson production, (3) the transfer of…
Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning

Science.gov (United States)

Li, Zhushan

2014-01-01

Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
Effect of Item Response Theory (IRT) Model Selection on Testlet-Based Test Equating. Research Report. ETS RR-14-19

Science.gov (United States)

Cao, Yi; Lu, Ru; Tao, Wei

2014-01-01

The local item independence assumption underlying traditional item response theory (IRT) models is often not met for tests composed of testlets. There are 3 major approaches to addressing this issue: (a) ignore the violation and use a dichotomous IRT model (e.g., the 2-parameter logistic [2PL] model), (b) combine the interdependent items to form a…
Item Response Theory Modeling of the Philadelphia Naming Test

Science.gov (United States)

Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D.

2015-01-01

Purpose: In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating…
A scientific approach to Plato's Atlantis

Directory of Open Access Journals (Sweden)

Massimo Rapisarda

2015-09-01

Full Text Available The myth of Atlantis is hard to die. This attempt to use scientific evidence to give it the final smash ends up with the doubt that it might not be totally unsubstantiated. The time of the supposed existence of Atlantis (around twelve thousand years ago was, in fact, characterized by technological revolutions, acknowledged by archaeology, and abrupt climate changes, documented by geology. In principle, it cannot therefore be ruled out that some of those dramatic events left a memory, later used by Plato as a basis for its tale. The climate changes involved the majority of the northern hemisphere, thus all the ancient civilizations (Egyptian, Mesopotamian, Indian and Chinese could have preserved reminiscence, but it is clear that the events occurring closer to Greece would have been more accessible to Plato. Among the Mediterranean sites that experienced the cataclysms of the beginning of the Holocene, a good candidate to host a primordial civilization might have been the archipelago then existing in the Strait of Sicily, a natural maritime link between Tunisia and Italy, prized by the presence of an obsidian source at Pantelleria. Eleven thousand five hundred years ago, a sudden sea level rise erased the archipelago, submerging the possible settlements, but Pantelleria obsidian ores are still there and could provide a significant clue. In fact, the potential discovery of artefacts, originating from a source now submerged by the sea level rise, would imply that the collection of the mineral took place when it was still emerged, namely at the time of Atlantis. Even if such discovery would not be sufficient to prove the existence of the mythical island, it would be enough to shake up the timeline of the human occupation in the region.
Problems with the factor analysis of items: Solutions based on item response theory and item parcelling

Directory of Open Access Journals (Sweden)

Gideon P. De Bruin

2004-10-01

Full Text Available The factor analysis of items often produces spurious results in the sense that unidimensional scales appear multidimensional. This may be ascribed to failure in meeting the assumptions of linearity and normality on which factor analysis is based. Item response theory is explicitly designed for the modelling of the non-linear relations between ordinal variables and provides a strong alternative to the factor analysis of items. Items may also be combined in parcels that are more likely to satisfy the assumptions of factor analysis than do the items. The use of the Rasch rating scale model and the factor analysis of parcels is illustrated with data obtained with the Locus of Control Inventory. The results of these analyses are compared with the results obtained through the factor analysis of items. It is shown that the Rasch rating scale model and the factoring of parcels produce superior results to the factor analysis of items. Recommendations for the analysis of scales are made. Opsomming Die faktorontleding van items lewer dikwels misleidende resultate op, veral in die opsig dat eendimensionele skale as meerdimensioneel voorkom. Hierdie resultate kan dikwels daaraan toegeskryf word dat daar nie aan die aannames van lineariteit en normaliteit waarop faktorontleding berus, voldoen word nie. Itemresponsteorie, wat eksplisiet vir die modellering van die nie-liniêre verbande tussen ordinale items ontwerp is, bied ’n aantreklike alternatief vir die faktorontleding van items. Items kan ook in pakkies gegroepeer word wat meer waarskynlik aan die aannames van faktorontleding voldoen as individuele items. Die gebruik van die Rasch beoordelingskaalmodel en die faktorontleding van pakkies word aan die hand van data wat met die Lokus van Beheervraelys verkry is, gedemonstreer. Die resultate van hierdie ontledings word vergelyk met die resultate wat deur ‘n faktorontleding van die individuele items verkry is. Die resultate dui daarop dat die Rasch
Generalization of the Lord-Wingersky Algorithm to Computing the Distribution of Summed Test Scores Based on Real-Number Item Scores

Science.gov (United States)

Kim, Seonghoon

2013-01-01

With known item response theory (IRT) item parameters, Lord and Wingersky provided a recursive algorithm for computing the conditional frequency distribution of number-correct test scores, given proficiency. This article presents a generalized algorithm for computing the conditional distribution of summed test scores involving real-number item…
Effects of Differential Item Functioning on Examinees' Test Performance and Reliability of Test

Science.gov (United States)

Lee, Yi-Hsuan; Zhang, Jinming

2017-01-01

Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…

Pemikiran Epistemologi Barat: dari Plato Sampai Gonseth

Directory of Open Access Journals (Sweden)

Nunu Burhanuddin

2015-06-01

Full Text Available This paper riviewing the Western epistemology thought. The theme focuses on Plato to Gonseth. The Epistemology that referred in this article, is to think about "how humans acquire knowledge?". From this then appear four types of sect modern western epistemology thought, namely: sect of empiricism, rationalism sect, kantinian sect, sect of positivism. Furthermore, the social positivism sciences developed by Comte leaves serious problems associated with the loss of the role of the subject. This problem being the background of epistemology philosophy appears that by Emund Husserl developed through the phenomenology, Habermas through hermeneutics, and Ferdinand Gonseth through critical theory.
Plato's ghost the modernist transformation of mathematics

CERN Document Server

Gray, Jeremy

2008-01-01

Plato's Ghost is the first book to examine the development of mathematics from 1880 to 1920 as a modernist transformation similar to those in art, literature, and music. Jeremy Gray traces the growth of mathematical modernism from its roots in problem solving and theory to its interactions with physics, philosophy, theology, psychology, and ideas about real and artificial languages. He shows how mathematics was popularized, and explains how mathematical modernism not only gave expression to the work of mathematicians and the professional image they sought to create for themselves, but how modernism also introduced deeper and ultimately unanswerable questions
On the Relationship between Classical Test Theory and Item Response Theory: From One to the Other and Back

Science.gov (United States)

Raykov, Tenko; Marcoulides, George A.

2016-01-01

The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete…
Differential Item Functioning in While-Listening Performance Tests: The Case of the International English Language Testing System (IELTS) Listening Module

Science.gov (United States)

Aryadoust, Vahid

2012-01-01

This article investigates a version of the International English Language Testing System (IELTS) listening test for evidence of differential item functioning (DIF) based on gender, nationality, age, and degree of previous exposure to the test. Overall, the listening construct was found to be underrepresented, which is probably an important cause…
Genesis 2–3 and Alcibiades's speech in Plato's Symposium : A ...

African Journals Online (AJOL)

Genesis 2–3 and Alcibiades's speech in Plato's Symposium : A cultural critical reading. ... Abstract. The purpose of this article is to discuss some basic problems and methodological steps concerning the encounter between Hebrews and Greeks in the Classical period and its impact on the Hellenistic era. The relationship ...
The PLATO System: A Study in the Diffusion of an Innovation.

Science.gov (United States)

Driscoll, Francis D.; Wolf, W. C., Jr.

This study was designed to ascertain the relationships between the steps of a tool designed to link knowledge production and the needs of knowledge users (the Wolf-Welsh Linkage Methodology or WWLM) with milestones in the evolution of an innovative computer-assisted instructional system called PLATO (Programming Logic for Advanced Teaching…
Bayesian item selection criteria for adaptive testing

NARCIS (Netherlands)

van der Linden, Willem J.

1996-01-01

R.J. Owen (1975) proposed an approximate empirical Bayes procedure for item selection in adaptive testing. The procedure replaces the true posterior by a normal approximation with closed-form expressions for its first two moments. This approximation was necessary to minimize the computational
Item validity vs. item discrimination index: a redundancy?

Science.gov (United States)

Panjaitan, R. L.; Irawati, R.; Sujana, A.; Hanifah, N.; Djuanda, D.

2018-03-01

In several literatures about evaluation and test analysis, it is common to find that there are calculations of item validity as well as item discrimination index (D) with different formula for each. Meanwhile, other resources said that item discrimination index could be obtained by calculating the correlation between the testee’s score in a particular item and the testee’s score on the overall test, which is actually the same concept as item validity. Some research reports, especially undergraduate theses tend to include both item validity and item discrimination index in the instrument analysis. It seems that these concepts might overlap for both reflect the test quality on measuring the examinees’ ability. In this paper, examples of some results of data processing on item validity and item discrimination index were compared. It would be discussed whether item validity and item discrimination index can be represented by one of them only or it should be better to present both calculations for simple test analysis, especially in undergraduate theses where test analyses were included.
Statistical Indexes for Monitoring Item Behavior under Computer Adaptive Testing Environment.

Science.gov (United States)

Zhu, Renbang; Yu, Feng; Liu, Su

A computerized adaptive test (CAT) administration usually requires a large supply of items with accurately estimated psychometric properties, such as item response theory (IRT) parameter estimates, to ensure the precision of examinee ability estimation. However, an estimated IRT model of a given item in any given pool does not always correctly…
Refractory Abundances of Terrestrial Planets and Their Stars: Testing [Si/Fe] Correlations with TESS and PLATO

Science.gov (United States)

Wolfgang, Angie; Fortney, Jonathan

2018-01-01

In standard models for planet formation, solid material in protoplanetary disks coagulate and collide to form rocky bodies. It therefore seems reasonable to assume that their chemical composition will follow the abundances of refractory elements, such as Si and Fe, in the host star, which has also accreted material from the disk. Backed by planet formation simulations which validate this assumption, planetary internal structure models have begun to use stellar abundances to break degeneracies in low-mass planet compositions inferred only from mass and radius. Inconveniently, our own Solar System contradicts this approach, as its terrestrial bodies exhibit a range of rock/iron ratios and the Sun's [Si/Fe] ratio is offset from the mean planetary [Si/Fe]. In this work, we explore what number and quality of observations we need to empirically measure the exoplanet-star [Si/Fe] correlation, given future transit missions, RV follow-up, and stellar characterization. Specifically, we generate synthetic datasets of terrestrial planet masses and radii and host star abundances assuming that the planets’ bulk [Si/Fe] ratio exactly tracks that of their host stars. We assign measurement uncertainties corresponding to expected precisions for TESS, PLATO, Gaia, and future RV instrumentation, and then invert the problem to infer the planet-star [Si/Fe] correlation given these observational constraints. Comparing the result to the generated truth, we find that 1% precision on the planet radii is needed to test whether [Si/Fe] ratios are correlated between exoplanet and host star. On the other hand, lower precisions can test for systematic offsets between planet and star [Si/Fe], which can constrain the importance of giant impacts for extrasolar terrestrial planet formation.
Item Response Theory Analyses of the Cambridge Face Memory Test (CFMT)

Science.gov (United States)

Cho, Sun-Joo; Wilmer, Jeremy; Herzmann, Grit; McGugin, Rankin; Fiset, Daniel; Van Gulick, Ana E.; Ryan, Katie; Gauthier, Isabel

2014-01-01

We evaluated the psychometric properties of the Cambridge face memory test (CFMT; Duchaine & Nakayama, 2006). First, we assessed the dimensionality of the test with a bi-factor exploratory factor analysis (EFA). This EFA analysis revealed a general factor and three specific factors clustered by targets of CFMT. However, the three specific factors appeared to be minor factors that can be ignored. Second, we fit a unidimensional item response model. This item response model showed that the CFMT items could discriminate individuals at different ability levels and covered a wide range of the ability continuum. We found the CFMT to be particularly precise for a wide range of ability levels. Third, we implemented item response theory (IRT) differential item functioning (DIF) analyses for each gender group and two age groups (Age ≤ 20 versus Age > 21). This DIF analysis suggested little evidence of consequential differential functioning on the CFMT for these groups, supporting the use of the test to compare older to younger, or male to female, individuals. Fourth, we tested for a gender difference on the latent facial recognition ability with an explanatory item response model. We found a significant but small gender difference on the latent ability for face recognition, which was higher for women than men by 0.184, at age mean 23.2, controlling for linear and quadratic age effects. Finally, we discuss the practical considerations of the use of total scores versus IRT scale scores in applications of the CFMT. PMID:25642930
Freud, Plato and Irigaray: A Morpho-Logic of Teaching and Learning

Science.gov (United States)

Peers, Chris

2012-01-01

This article discusses two well-known texts that respectively describe learning and teaching, drawn from the work of Freud and Plato. These texts are considered in psychoanalytic terms using a methodology drawn from the philosophy of Luce Irigaray. In particular the article addresses Irigaray's approach to the analysis of speech and utterance as a…
Item Response Theory analysis of Fagerström Test for Cigarette Dependence.

Science.gov (United States)

Svicher, Andrea; Cosci, Fiammetta; Giannini, Marco; Pistelli, Francesco; Fagerström, Karl

2018-02-01

The Fagerström Test for Cigarette Dependence (FTCD) and the Heaviness of Smoking Index (HSI) are the gold standard measures to assess cigarette dependence. However, FTCD reliability and factor structure have been questioned and HSI psychometric properties are in need of further investigations. The present study examined the psychometrics properties of the FTCD and the HSI via the Item Response Theory. The study was a secondary analysis of data collected in 862 Italian daily smokers. Confirmatory factor analysis was run to evaluate the dimensionality of FTCD. A Grade Response Model was applied to FTCD and HSI to verify the fit to the data. Both item and test functioning were analyzed and item statistics, Test Information Function, and scale reliabilities were calculated. Mokken Scale Analysis was applied to estimate homogeneity and Loevinger's coefficients were calculated. The FTCD showed unidimensionality and homogeneity for most of the items and for the total score. It also showed high sensitivity and good reliability from medium to high levels of cigarette dependence, although problems related to some items (i.e., items 3 and 5) were evident. HSI had good homogeneity, adequate item functioning, and high reliability from medium to high levels of cigarette dependence. Significant Differential Item Functioning was found for items 1, 4, 5 of the FTCD and for both items of HSI. HSI seems highly recommended in clinical settings addressed to heavy smokers while FTCD would be better used in smokers with a level of cigarette dependence ranging between low and high. Copyright © 2017 Elsevier Ltd. All rights reserved.
Development of abbreviated eight-item form of the Penn Verbal Reasoning Test.

Science.gov (United States)

Bilker, Warren B; Wierzbicki, Michael R; Brensinger, Colleen M; Gur, Raquel E; Gur, Ruben C

2014-12-01

The ability to reason with language is a highly valued cognitive capacity that correlates with IQ measures and is sensitive to damage in language areas. The Penn Verbal Reasoning Test (PVRT) is a 29-item computerized test for measuring abstract analogical reasoning abilities using language. The full test can take over half an hour to administer, which limits its applicability in large-scale studies. We previously described a procedure for abbreviating a clinical rating scale and a modified procedure for reducing tests with a large number of items. Here we describe the application of the modified method to reducing the number of items in the PVRT to a parsimonious subset of items that accurately predicts the total score. As in our previous reduction studies, a split sample is used for model fitting and validation, with cross-validation to verify results. We find that an 8-item scale predicts the total 29-item score well, achieving a correlation of .9145 for the reduced form for the model fitting sample and .8952 for the validation sample. The results indicate that a drastically abbreviated version, which cuts administration time by more than 70%, can be safely administered as a predictor of PVRT performance. © The Author(s) 2014.
Development of Abbreviated Eight-Item Form of the Penn Verbal Reasoning Test

Science.gov (United States)

Bilker, Warren B.; Wierzbicki, Michael R.; Brensinger, Colleen M.; Gur, Raquel E.; Gur, Ruben C.

2014-01-01

The ability to reason with language is a highly valued cognitive capacity that correlates with IQ measures and is sensitive to damage in language areas. The Penn Verbal Reasoning Test (PVRT) is a 29-item computerized test for measuring abstract analogical reasoning abilities using language. The full test can take over half an hour to administer, which limits its applicability in large-scale studies. We previously described a procedure for abbreviating a clinical rating scale and a modified procedure for reducing tests with a large number of items. Here we describe the application of the modified method to reducing the number of items in the PVRT to a parsimonious subset of items that accurately predicts the total score. As in our previous reduction studies, a split sample is used for model fitting and validation, with cross-validation to verify results. We find that an 8-item scale predicts the total 29-item score well, achieving a correlation of .9145 for the reduced form for the model fitting sample and .8952 for the validation sample. The results indicate that a drastically abbreviated version, which cuts administration time by more than 70%, can be safely administered as a predictor of PVRT performance. PMID:24577310
Application of Item Response Theory to Tests of Substance-related Associative Memory

Science.gov (United States)

Shono, Yusuke; Grenard, Jerry L.; Ames, Susan L.; Stacy, Alan W.

2015-01-01

A substance-related word association test (WAT) is one of the commonly used indirect tests of substance-related implicit associative memory and has been shown to predict substance use. This study applied an item response theory (IRT) modeling approach to evaluate psychometric properties of the alcohol- and marijuana-related WATs and their items among 775 ethnically diverse at-risk adolescents. After examining the IRT assumptions, item fit, and differential item functioning (DIF) across gender and age groups, the original 18 WAT items were reduced to 14- and 15-items in the alcohol- and marijuana-related WAT, respectively. Thereafter, unidimensional one- and two-parameter logistic models (1PL and 2PL models) were fitted to the revised WAT items. The results demonstrated that both alcohol- and marijuana-related WATs have good psychometric properties. These results were discussed in light of the framework of a unified concept of construct validity (Messick, 1975, 1989, 1995). PMID:25134051
The Dysexecutive Questionnaire advanced: item and test score characteristics, 4-factor solution, and severity classification.

Science.gov (United States)

Bodenburg, Sebastian; Dopslaff, Nina

2008-01-01

The Dysexecutive Questionnaire (DEX, , Behavioral assessment of the dysexecutive syndrome, 1996) is a standardized instrument to measure possible behavioral changes as a result of the dysexecutive syndrome. Although initially intended only as a qualitative instrument, the DEX has also been used increasingly to address quantitative problems. Until now there have not been more fundamental statistical analyses of the questionnaire's testing quality. The present study is based on an unselected sample of 191 patients with acquired brain injury and reports on the data relating to the quality of the items, the reliability and the factorial structure of the DEX. Item 3 displayed too great an item difficulty, whereas item 11 was not sufficiently discriminating. The DEX's reliability in self-rating is r = 0.85. In addition to presenting the statistical values of the tests, a clinical severity classification of the overall scores of the 4 found factors and of the questionnaire as a whole is carried out on the basis of quartile standards.
Assessing Differential Item Functioning on the Test of Relational Reasoning

Directory of Open Access Journals (Sweden)

Denis Dumas

2018-03-01

Full Text Available The test of relational reasoning (TORR is designed to assess the ability to identify complex patterns within visuospatial stimuli. The TORR is designed for use in school and university settings, and therefore, its measurement invariance across diverse groups is critical. In this investigation, a large sample, representative of a major university on key demographic variables, was collected, and the resulting data were analyzed using a multi-group, multidimensional item-response theory model-comparison procedure. No significant differential item functioning was found on any of the TORR items across any of the demographic groups of interest. This finding is interpreted as evidence of the cultural fairness of the TORR, and potential test-development choices that may have contributed to that cultural fairness are discussed.
Algorithmic test design using classical item parameters

NARCIS (Netherlands)

van der Linden, Willem J.; Adema, Jos J.

Two optimalization models for the construction of tests with a maximal value of coefficient alpha are given. Both models have a linear form and can be solved by using a branch-and-bound algorithm. The first model assumes an item bank calibrated under the Rasch model and can be used, for instance,
Detection of differential item functioning using Lagrange multiplier tests

NARCIS (Netherlands)

Glas, Cornelis A.W.

1998-01-01

Abstract: In the present paper it is shown that differential item functioning can be evaluated using the Lagrange multiplier test or Rao’s efficient score test. The test is presented in the framework of a number of IRT models such as the Rasch model, the OPLM, the 2-parameter logistic model, the

Overview of Classical Test Theory and Item Response Theory for Quantitative Assessment of Items in Developing Patient-Reported Outcome Measures

Science.gov (United States)

Cappelleri, Joseph C.; Lundy, J. Jason; Hays, Ron D.

2014-01-01

Introduction The U.S. Food and Drug Administration’s patient-reported outcome (PRO) guidance document defines content validity as “the extent to which the instrument measures the concept of interest” (FDA, 2009, p. 12). “Construct validity is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity” (Strauss & Smith, 2009, p. 7). Hence both qualitative and quantitative information are essential in evaluating the validity of measures. Methods We review classical test theory and item response theory approaches to evaluating PRO measures including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized “difficulty” (severity) order of items is represented by observed responses. Conclusion Classical test theory and item response theory can be useful in providing a quantitative assessment of items and scales during the content validity phase of patient-reported outcome measures. Depending on the particular type of measure and the specific circumstances, either one or both approaches should be considered to help maximize the content validity of PRO measures. PMID:24811753
Test of Achievement in Quantitative Economics for Secondary Schools: Construction and Validation Using Item Response Theory

Science.gov (United States)

Eleje, Lydia I.; Esomonu, Nkechi P. M.

2018-01-01

A Test to measure achievement in quantitative economics among secondary school students was developed and validated in this study. The test is made up 20 multiple choice test items constructed based on quantitative economics sub-skills. Six research questions guided the study. Preliminary validation was done by two experienced teachers in…
Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures.

Science.gov (United States)

Cappelleri, Joseph C; Jason Lundy, J; Hays, Ron D

2014-05-01

The US Food and Drug Administration's guidance for industry document on patient-reported outcomes (PRO) defines content validity as "the extent to which the instrument measures the concept of interest" (FDA, 2009, p. 12). According to Strauss and Smith (2009), construct validity "is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity" (p. 7). Hence, both qualitative and quantitative information are essential in evaluating the validity of measures. We review classical test theory and item response theory (IRT) approaches to evaluating PRO measures, including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized "difficulty" (severity) order of items is represented by observed responses. If a researcher has few qualitative data and wants to get preliminary information about the content validity of the instrument, then descriptive assessments using classical test theory should be the first step. As the sample size grows during subsequent stages of instrument development, confidence in the numerical estimates from Rasch and other IRT models (as well as those of classical test theory) would also grow. Classical test theory and IRT can be useful in providing a quantitative assessment of items and scales during the content-validity phase of PRO-measure development. Depending on the particular type of measure and the specific circumstances, the classical test theory and/or the IRT should be considered to help maximize the content validity of PRO measures. Copyright © 2014 Elsevier HS Journals, Inc. All rights reserved.
Measurement Properties of Two Innovative Item Formats in a Computer-Based Test

Science.gov (United States)

Wan, Lei; Henly, George A.

2012-01-01

Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats--the figural response (FR) and constructed response (CR) formats used in a K-12 computerized…
[Plato's conceptions of disorders of the soul (Ta peri psuchên nosêmata). Timaeus as the beginning of a dynamic and ethic psychopathology].

Science.gov (United States)

Godderis, J

1998-01-01

This contribution to the study of the evolution of fundamental concepts in psychiatry, and in particular of the interpretative models of mental disease, focuses on Plato's conceptions concerning the "disorders of the soul". Plato's "psychopathological" work suggests the decline of an hereditary conglomeration of interpretative arrangements of the irrational phenomena related to mental disease which, corresponding to the social needs of that time, had been united by the belief in myth and its therapeutic value. These archaic religious conceptions have most certainly been reversed by Plato, especially in his Timaeus, one of the three most influential of his dialogues. In a notable passage in this cosmological dialogue (86b ff.) Plato treats of those diseases of the soul which are caused by things physical, whether this be a "defective bodily constitution" or "faulty education". The diseases of the soul are thus no longer considered having a divine origin. Mental diseases to which man is unwittingly subject by defects in birth or education concern himself and his inner life and they cannot be dismissed with simplistic allegories. According to Plato they originate from a conflict, supported by a secret, hidden, irrational "self" that has its roots in the sôma, the rational "self" being only able to recuperate its total integrity if it manages, through self-discipline and knowledge, to check the somatic impulses, the folly of the body. Also, Plato offers a series of remedies to correct the undue influence of body on soul and soul on body, with a view to instituting a right balance and proportion between them. This, together with a stress on "care of the soul", particularly of the divine and immortal element, implicitly assumes that it is in man's power to apply the necessary remedies to himself and effect some sort of readjustment.
Differential Item Functioning (DIF) among Spanish-Speaking English Language Learners (ELLs) in State Science Tests

Science.gov (United States)

Ilich, Maria O.

Psychometricians and test developers evaluate standardized tests for potential bias against groups of test-takers by using differential item functioning (DIF). English language learners (ELLs) are a diverse group of students whose native language is not English. While they are still learning the English language, they must take their standardized tests for their school subjects, including science, in English. In this study, linguistic complexity was examined as a possible source of DIF that may result in test scores that confound science knowledge with a lack of English proficiency among ELLs. Two years of fifth-grade state science tests were analyzed for evidence of DIF using two DIF methods, Simultaneous Item Bias Test (SIBTest) and logistic regression. The tests presented a unique challenge in that the test items were grouped together into testlets---groups of items referring to a scientific scenario to measure knowledge of different science content or skills. Very large samples of 10, 256 students in 2006 and 13,571 students in 2007 were examined. Half of each sample was composed of Spanish-speaking ELLs; the balance was comprised of native English speakers. The two DIF methods were in agreement about the items that favored non-ELLs and the items that favored ELLs. Logistic regression effect sizes were all negligible, while SIBTest flagged items with low to high DIF. A decrease in socioeconomic status and Spanish-speaking ELL diversity may have led to inconsistent SIBTest effect sizes for items used in both testing years. The DIF results for the testlets suggested that ELLs lacked sufficient opportunity to learn science content. The DIF results further suggest that those constructed response test items requiring the student to draw a conclusion about a scientific investigation or to plan a new investigation tended to favor ELLs.
ST-elevation acute coronary syndromes in the Platelet Inhibition and Patient Outcomes (PLATO) trial

DEFF Research Database (Denmark)

Armstrong, Paul W; Siha, Hany; Fu, Yuling

2012-01-01

Ticagrelor, when compared with clopidogrel, reduced the 12-month risk of vascular death/myocardial infarction and stroke in patients with ST-elevation acute coronary syndromes intended to undergo primary percutaneous coronary intervention in the PLATelet inhibition and patient Outcomes (PLATO...
Designing a Virtual Item Bank Based on the Techniques of Image Processing

Science.gov (United States)

Liao, Wen-Wei; Ho, Rong-Guey

2011-01-01

One of the major weaknesses of the item exposure rates of figural items in Intelligence Quotient (IQ) tests lies in its inaccuracies. In this study, a new approach is proposed and a useful test tool known as the Virtual Item Bank (VIB) is introduced. The VIB combine Automatic Item Generation theory and image processing theory with the concepts of…
Science Library of Test Items. Volume Eight. Mastery Testing Program. Series 3 & 4 Supplements to Introduction and Manual.

Science.gov (United States)

New South Wales Dept. of Education, Sydney (Australia).

Continuing a series of short tests aimed at measuring student mastery of specific skills in the natural sciences, this supplementary volume includes teachers' notes, a users' guide and inspection copies of test items 27 to 50. Answer keys and test scoring statistics are provided. The items are designed for grades 7 through 10, and a list of the…
New decision criteria for selecting delta check methods based on the ratio of the delta difference to the width of the reference range can be generally applicable for each clinical chemistry test item.

Science.gov (United States)

Park, Sang Hyuk; Kim, So-Young; Lee, Woochang; Chun, Sail; Min, Won-Ki

2012-09-01

Many laboratories use 4 delta check methods: delta difference, delta percent change, rate difference, and rate percent change. However, guidelines regarding decision criteria for selecting delta check methods have not yet been provided. We present new decision criteria for selecting delta check methods for each clinical chemistry test item. We collected 811,920 and 669,750 paired (present and previous) test results for 27 clinical chemistry test items from inpatients and outpatients, respectively. We devised new decision criteria for the selection of delta check methods based on the ratio of the delta difference to the width of the reference range (DD/RR). Delta check methods based on these criteria were compared with those based on the CV% of the absolute delta difference (ADD) as well as those reported in 2 previous studies. The delta check methods suggested by new decision criteria based on the DD/RR ratio corresponded well with those based on the CV% of the ADD except for only 2 items each in inpatients and outpatients. Delta check methods based on the DD/RR ratio also corresponded with those suggested in the 2 previous studies, except for 1 and 7 items in inpatients and outpatients, respectively. The DD/RR method appears to yield more feasible and intuitive selection criteria and can easily explain changes in the results by reflecting both the biological variation of the test item and the clinical characteristics of patients in each laboratory. We suggest this as a measure to determine delta check methods.
Stochastic order in dichotomous item response models for fixed tests, research adaptive tests, or multiple abilities

NARCIS (Netherlands)

van der Linden, Willem J.

1995-01-01

Dichotomous item response theory (IRT) models can be viewed as families of stochastically ordered distributions of responses to test items. This paper explores several properties of such distributiom. The focus is on the conditions under which stochastic order in families of conditional
Reading ability and print exposure: item response theory analysis of the author recognition test.

Science.gov (United States)

Moore, Mariah; Gordon, Peter C

2015-12-01

In the author recognition test (ART), participants are presented with a series of names and foils and are asked to indicate which ones they recognize as authors. The test is a strong predictor of reading skill, and this predictive ability is generally explained as occurring because author knowledge is likely acquired through reading or other forms of print exposure. In this large-scale study (1,012 college student participants), we used item response theory (IRT) to analyze item (author) characteristics in order to facilitate identification of the determinants of item difficulty, provide a basis for further test development, and optimize scoring of the ART. Factor analysis suggested a potential two-factor structure of the ART, differentiating between literary and popular authors. Effective and ineffective author names were identified so as to facilitate future revisions of the ART. Analyses showed that the ART is a highly significant predictor of the time spent encoding words, as measured using eyetracking during reading. The relationship between the ART and time spent reading provided a basis for implementing a higher penalty for selecting foils, rather than the standard method of ART scoring (names selected minus foils selected). The findings provide novel support for the view that the ART is a valid indicator of reading volume. Furthermore, they show that frequency data can be used to select items of appropriate difficulty, and that frequency data from corpora based on particular time periods and types of texts may allow adaptations of the test for different populations.
The Role of Item Models in Automatic Item Generation

Science.gov (United States)

Gierl, Mark J.; Lai, Hollis

2012-01-01

Automatic item generation represents a relatively new but rapidly evolving research area where cognitive and psychometric theories are used to produce tests that include items generated using computer technology. Automatic item generation requires two steps. First, test development specialists create item models, which are comparable to templates…
Assessment of chromium(VI) release from 848 jewellery items by use of a diphenylcarbazide spot test

DEFF Research Database (Denmark)

Bregnbak, David; Johansen, Jeanne D.; Hamann, Dathan

2016-01-01

We recently evaluated and validated a diphenylcarbazide(DPC)-based screening spot test that can detect the release of chromium(VI) ions (≥0.5 ppm) from various metallic items and leather goods (1). We then screened a selection of metal screws, leather shoes, and gloves, as well as 50 earrings......, and identified chromium(VI) release from one earring. In the present study, we used the DPC spot test to assess chromium(VI) release in a much larger sample of jewellery items (n=848), 160 (19%) of which had previously be shown to contain chromium when analysed with X-ray fluorescence spectroscopy (2)....
Item difficulty of multiple choice tests dependant on different item response formats – An experiment in fundamental research on psychological assessment

Directory of Open Access Journals (Sweden)

KLAUS D. KUBINGER

2007-12-01

Full Text Available Multiple choice response formats are problematical as an item is often scored as solved simply because the test-taker is a lucky guesser. Instead of applying pertinent IRT models which take guessing effects into account, a pragmatic approach of re-conceptualizing multiple choice response formats to reduce the chance of lucky guessing is considered. This paper compares the free response format with two different multiple choice formats. A common multiple choice format with a single correct response option and five distractors (“1 of 6” is used, as well as a multiple choice format with five response options, of which any number of the five is correct and the item is only scored as mastered if all the correct response options and none of the wrong ones are marked (“x of 5”. An experiment was designed, using pairs of items with exactly the same content but different response formats. 173 test-takers were randomly assigned to two test booklets of 150 items altogether. Rasch model analyses adduced a fitting item pool, after the deletion of 39 items. The resulting item difficulty parameters were used for the comparison of the different formats. The multiple choice format “1 of 6” differs significantly from “x of 5”, with a relative effect of 1.63, while the multiple choice format “x of 5” does not significantly differ from the free response format. Therefore, the lower degree of difficulty of items with the “1 of 6” multiple choice format is an indicator of relevant guessing effects. In contrast the “x of 5” multiple choice format can be seen as an appropriate substitute for free response format.
Latent Trait Theory Applications to Test Item Bias Methodology. Research Memorandum No. 1.

Science.gov (United States)

Osterlind, Steven J.; Martois, John S.

This study discusses latent trait theory applications to test item bias methodology. A real data set is used in describing the rationale and application of the Rasch probabilistic model item calibrations across various ethnic group populations. A high school graduation proficiency test covering reading comprehension, writing mechanics, and…
Viewpoint: Central adjudication of myocardial infarction in outcome-driven clinical trials--common patterns in TRITON, RECORD, and PLATO?

Science.gov (United States)

Serebruany, Victor L; Atar, Dan

2012-09-01

Central adjudication in randomised controlled outcome-driven trials represents a traditional approach to maintain data integrity by applying uniformed rules for assessment of clinical events. It was the purpose of this investigation to determine the patterns of myocardial infarction (MI) adjudication in the TRITON, RECORD, and PLATO trials. We were matching centrally-adjudicated MI's (CAMI's) from the official trial publication with the site-reported MI (SRMI's) count from the Food and Drug Administration's secondary analyses for the investigational compounds prasugrel (TRITON), rosiglitazone (RECORD), and ticagrelor (PLATO). CAMI numbers showed a remarkable discrepancy to SRMI's by more than a doubling of the difference: from 72 to 145 events in TRITON favoring prasugrel (from a hazard ratio [HR]=0.76, p=0.08; to a HR=0.76, p<0.001), and from 44 to 89 events in favour of ticagrelor in PLATO (from a HR=0.94, p=0.095; to a HR=0.84, p<0.001). In contrast, in the RECORD trial, the CAMI count was less than the SRMI count (from 24 to 8 events, from a HR=1.42, p=0.93; to a HR=1.14, p=0.96), in this case diminishing cardiovascular hazards in favour of rosiglitazone. In conclusion, central adjudication in the TRITON, the RECORD, and the PLATO trial turned out to have a critical impact on study outcomes. Trial publications should in the future include site-reported major efficacy and safety endpoints to preserve data integrity. The regulatory authorities should consider independent audits when there is a major disagreement between centrally adjudicated and site reported events influencing the results of a major clinical trial.
Efficient methods for solving discrete topology design problems in the PLATO-N project

DEFF Research Database (Denmark)

Canh, Nam Nguyen; Stolpe, Mathias

This paper considers the general multiple load structural topology design problems in the framework of the PLATO-N project. The problems involve a large number of discrete design variables and were modeled as a non-convex mixed 0–1 program. For the class of problems considered, a global...... optimization method based on the branch-and-cut concept was developed and implemented. In the method a large number of continuous relaxations were solved. We also present an algorithm for generating cuts to strengthen the quality of the relaxations. Several heuristics were also investigated to obtain efficient...... algorithms. The branch and cut method is used to solve benchmark examples which can be used to validate other methods and heuristics....
A strategy for optimizing item-pool management

NARCIS (Netherlands)

Ariel, A.; van der Linden, Willem J.; Veldkamp, Bernard P.

2006-01-01

Item-pool management requires a balancing act between the input of new items into the pool and the output of tests assembled from it. A strategy for optimizing item-pool management is presented that is based on the idea of a periodic update of an optimal blueprint for the item pool to tune item
From Plato to Erikson: How the War on "Bad Play" Has Impoverished Higher Education

Science.gov (United States)

Carnes, Mark C.

2015-01-01

For centuries, the titans of educational reform--Plato, Rousseau, Dewey, Piaget, Erikson, Csikszentmihalyi and others--have championed the educational benefits of play. Yet many professors and administrators are boggled by the idea of playing academic games in college. They instantly dismiss faculty initiatives like "Reacting to the…

What Plato and Murdoch Think About Love

Directory of Open Access Journals (Sweden)

Shadi Shakouri

2012-07-01

Full Text Available There are many interpretations of love and lots of scholars write and talk on love; however, what exactly is the meaning of love? Iris Murdoch’s works are an accumulation of emotional relationships and feelings of love. Her great subject is love, both sexual and non-sexual, and her characters are the portrayal of a small group of people caught up in convoluted ties of love and hate, with Eros ruling over them (Cohen 22. Murdoch was one of the most respected British writers and philosophers of the second half of the twentieth century and, of course, the postwar period. In Murdoch’s novels, love is one of the central themes—marriage, as the institution of love, more often binds than frees. Her characters are mainly ego-centric people who struggle to love and are often overwhelmed by the factor of self-obsession, jealousy, ambition, fascination with suffering and charismatic power. They are absolutely ordinary people with a consuming demand for love, and mental and physical exile. Murdoch was inspired by Plato’s ideas in many ways. Like art, here again Plato’s idea of love is more skeptical than Murdoch’s, whereas Murdoch kept it only as a way to the Good, creation, and happiness. Murdoch and Plato saw love more as a Freudian concept, the Eros, the word that comes from the name of the first Greek god of love. Both the philosophers, Plato and Murdoch, believed that this erotic longing and desires revived by Eros can led to a new direction, a way toward virtue and truth. Her protagonist or marginalized characters are usually tackling it with either vulgarity or the heavenly, which results in creation, art or salvation. Murdoch, as a major moral philosopher, usually grasps the chances to encapsulate her moral visions in her works, and created novels that should be counted as meditations on human love and goodness.
Fostering a student's skill for analyzing test items through an authentic task

Science.gov (United States)

Setiawan, Beni; Sabtiawan, Wahyu Budi

2017-08-01

Analyzing test items is a skill that must be mastered by prospective teachers, in order to determine the quality of test questions which have been written. The main aim of this research was to describe the effectiveness of authentic task to foster the student's skill for analyzing test items involving validity, reliability, item discrimination index, level of difficulty, and distractor functioning through the authentic task. The participant of the research is students of science education study program, science and mathematics faculty, Universitas Negeri Surabaya, enrolled for assessment course. The research design was a one-group posttest design. The treatment in this study is that the students were provided an authentic task facilitating the students to develop test items, then they analyze the items like a professional assessor using Microsoft Excel and Anates Software. The data of research obtained were analyzed descriptively, such as the analysis was presented by displaying the data of students' skill, then they were associated with theories or previous empirical studies. The research showed the task facilitated the students to have the skills. Thirty-one students got a perfect score for the analyzing, five students achieved 97% mastery, two students had 92% mastery, and another two students got 89% and 79% of mastery. The implication of the finding was the students who get authentic tasks forcing them to perform like a professional, the possibility of the students for achieving the professional skills will be higher at the end of learning.
The role of the poet in Plato's ideal cities of Callipolis and Magnesia

Directory of Open Access Journals (Sweden)

Gerard Naddaf

2007-12-01

Full Text Available Plato's attitude toward the poets and poetry has always been a flashpoint of debate, controversy and notoriety, but most scholars have failed to see their central role in the ideal cities of the Republic and the Laws, that is, Callipolis and Magnesia. In this paper, I argue that in neither dialogue does Plato "exile" the poets, but, instead, believes they must, like all citizens, exercise the expertise proper to their profession, allowing them the right to become full-fledged participants in the productive class. Moreover, attention to certain details reveals that Plato harnesses both positive and negative factors in poetry to bring his ideal cities closer to a practical realization. The status of the poet and his craft in this context has rarely to my knowledge been addressed.A atitude de Platão com relação aos poetas e à poesia tem sempre sido um ponto de debate, controvérsia e notoriedade, mas a maioria dos estudiosos não consegue ver seu papel central nas cidades ideais da República e das Leis, ou seja, Callipolis e Magnésia. Neste artigo, defendo que em nenhum dos dois diálogos Platão exila os poetas, mas, ao contrário, acredita que eles devem, como todos os cidadãos, exercitar a competência própria à sua profissão, permitindo-lhes o direito de se tornarem participantes com todos os direitos da classe produtora. Principalmente, se prestarmos a atenção devida em certos detalhes, veremos que Platão controla tanto os fatores positivos, como os negativos na poesia, para aproximar mais suas cidades ideais da realização prática. A meu ver, o estatuto do poeta e de sua habilidade, nesse contexto, foram raramente estudados.
The quadratic relationship between difficulty of intelligence test items and their correlations with working memory

Directory of Open Access Journals (Sweden)

Tomasz eSmoleń

2015-08-01

Full Text Available Fluid intelligence (Gf is a crucial cognitive ability that involves abstract reasoning in order to solve novel problems. Recent research demonstrated that Gf strongly depends on the individual effectiveness of working memory (WM. We investigated a popular claim that if the storage capacity underlay the WM-Gf correlation, then such a correlation should increase with an increasing number of items or rules (load in a Gf test. As often no such link is observed, on that basis the storage-capacity account is rejected, and alternative accounts of Gf (e.g., related to executive control or processing speed are proposed. Using both analytical inference and numerical simulations, we demonstrated that the load-dependent change in correlation is primarily a function of the amount of floor/ceiling effect for particular items. Thus, the item-wise WM correlation of a Gf test depends on its overall difficulty, and the difficulty distribution across its items. When the early test items yield huge ceiling, but the late items do not approach floor, that correlation will increase throughout the test. If the early items locate themselves between ceiling and floor, but the late items approach floor, the respective correlation will decrease. For a hallmark Gf test, the Raven test, whose items span from ceiling to floor, the quadratic relationship is expected, and it was shown empirically using a large sample and two types of WMC tasks. In consequence, no changes in correlation due to varying WM/Gf load, or lack of them, can yield an argument for or against any theory of WM/Gf. Moreover, as the mathematical properties of the correlation formula make it relatively immune to ceiling/floor effects for overall moderate correlations, only minor changes (if any in the WM-Gf correlation should be expected for many psychological tests.
The quadratic relationship between difficulty of intelligence test items and their correlations with working memory.

Science.gov (United States)

Smolen, Tomasz; Chuderski, Adam

2015-01-01

Fluid intelligence (Gf) is a crucial cognitive ability that involves abstract reasoning in order to solve novel problems. Recent research demonstrated that Gf strongly depends on the individual effectiveness of working memory (WM). We investigated a popular claim that if the storage capacity underlay the WM-Gf correlation, then such a correlation should increase with an increasing number of items or rules (load) in a Gf-test. As often no such link is observed, on that basis the storage-capacity account is rejected, and alternative accounts of Gf (e.g., related to executive control or processing speed) are proposed. Using both analytical inference and numerical simulations, we demonstrated that the load-dependent change in correlation is primarily a function of the amount of floor/ceiling effect for particular items. Thus, the item-wise WM correlation of a Gf-test depends on its overall difficulty, and the difficulty distribution across its items. When the early test items yield huge ceiling, but the late items do not approach floor, that correlation will increase throughout the test. If the early items locate themselves between ceiling and floor, but the late items approach floor, the respective correlation will decrease. For a hallmark Gf-test, the Raven-test, whose items span from ceiling to floor, the quadratic relationship is expected, and it was shown empirically using a large sample and two types of WMC tasks. In consequence, no changes in correlation due to varying WM/Gf load, or lack of them, can yield an argument for or against any theory of WM/Gf. Moreover, as the mathematical properties of the correlation formula make it relatively immune to ceiling/floor effects for overall moderate correlations, only minor changes (if any) in the WM-Gf correlation should be expected for many psychological tests.
Optimizing the Use of Response Times for Item Selection in Computerized Adaptive Testing

Science.gov (United States)

Choe, Edison M.; Kern, Justin L.; Chang, Hua-Hua

2018-01-01

Despite common operationalization, measurement efficiency of computerized adaptive testing should not only be assessed in terms of the number of items administered but also the time it takes to complete the test. To this end, a recent study introduced a novel item selection criterion that maximizes Fisher information per unit of expected response…
Los platos de los peces y el más allá

OpenAIRE

Aranegui Gascó, Carmen

1996-01-01

- Interpretación de la decoración de los platos de peces ibéticos como expresión de una determinada concepción del orden del universo y del tránsito a la otra vida. Flores, estrellas y espirales son la expresion de los tres elementos, mientras que el pez muestra el camino hacia el más allá.
Applications of Multidimensional Item Response Theory Models with Covariates to Longitudinal Test Data. Research Report. ETS RR-16-21

Science.gov (United States)

Fu, Jianbin

2016-01-01

The multidimensional item response theory (MIRT) models with covariates proposed by Haberman and implemented in the "mirt" program provide a flexible way to analyze data based on item response theory. In this report, we discuss applications of the MIRT models with covariates to longitudinal test data to measure skill differences at the…
Factor Structure and Reliability of Test Items for Saudi Teacher Licence Assessment

Science.gov (United States)

Alsadaawi, Abdullah Saleh

2017-01-01

The Saudi National Assessment Centre administers the Computer Science Teacher Test for teacher certification. The aim of this study is to explore gender differences in candidates' scores, and investigate dimensionality, reliability, and differential item functioning using confirmatory factor analysis and item response theory. The confirmatory…
The effects of linguistic modification on ESL students' comprehension of nursing course test items.

Science.gov (United States)

Bosher, Susan; Bowles, Melissa

2008-01-01

Recent research has indicated that language may be a source of construct-irrelevant variance for non-native speakers of English, or English as a second language (ESL) students, when they take exams. As a result, exams may not accurately measure knowledge of nursing content. One accommodation often used to level the playing field for ESL students is linguistic modification, a process by which the reading load of test items is reduced while the content and integrity of the item are maintained. Research on the effects of linguistic modification has been conducted on examinees in the K-12 population, but is just beginning in other areas. This study describes the collaborative process by which items from a pathophysiology exam were linguistically modified and subsequently evaluated for comprehensibility by ESL students. Findings indicate that in a majority of cases, modification improved examinees' comprehension of test items. Implications for test item writing and future research are discussed.
IRT-based test construction

OpenAIRE

van der Linden, Willem J.; Theunissen, T.J.J.M.; Boekkooi-Timminga, Ellen; Kelderman, Henk

1987-01-01

Four discussions of test construction based on item response theory (IRT) are presented. The first discussion, "Test Design as Model Building in Mathematical Programming" (T.J.J.M. Theunissen), presents test design as a decision process under certainty. A natural way of modeling this process leads to mathematical programming. General models of test construction are discussed, with information about algorithms and heuristics; ideas about the analysis and refinement of test constraints are also...
Item-Based Top-N Recommendation Algorithms

National Research Council Canada - National Science Library

Deshpande, Mukund; Karypis, George

2003-01-01

... items that will be of interest to a certain user. User-based collaborative filtering is the most successful technology for building recommender systems to date, and is extensively used in many commercial recommender systems...
Direct healthcare costs and cost-effectiveness of acute coronary syndrome secondary prevention with ticagrelor compared to clopidogrel: economic evaluation from the public payer's perspective in Poland based on the PLATO trial results.

Science.gov (United States)

Pawęska, Justyna; Macioch, Tomasz; Perkowski, Piotr; Budaj, Andrzej; Niewada, Maciej

2014-01-01

Ticagrelor is the first reversibly binding oral P2Y12 receptor antagonist designed to reduce clinical thrombotic events in patients with acute coronary syndrome (ACS). Compared to clopidogrel, ticagrelor has been proven to significantly reduce the rate of death from vascular causes, myocardial infarction (MI), or stroke without an increase in the rate of overall major bleeding in patients who have an ACS with or without ST-segment elevation (STEMI and NSTEMI) or unstable angina (UA). To evaluate the cost-effectiveness and healthcare costs associated with secondary prevention of ACS using ticagrelor or clopidogrel in patients after STEMI, NSTEMI and UA. An economic model based on results from the PLATO trial was used to evaluate the cost-effectiveness of one-year therapy with ticagrelor or clopidogrel. The structure of the model consisted of two parts, i.e. the decision tree with one-year PLATO results and the Markov model with lifelong estimations, which exceeded PLATO follow-up data. The model was adjusted to Polish settings with country-specific data on death rates in the general population and direct medical costs calculated from the public payer's perspective. Costs were derived from the National Health Fund (NHF) and the Ministry of Health and presented in PLN 2013 values. Annual mean costs of second and subsequent years after stroke or MI were obtained from the literature. Uncertainty of assumed parameters was tested in scenarios and probabilistic sensitivity analyses. The adopted model allowed the estimation of an incremental cost-effectiveness ratio for life years gained (LYG) and an incremental cost-utility ratio for quality adjusted life years (QALY). Total direct medical costs to the public payer at a one year horizon were 2,905 PLN higher with ticagrelor than with clopidogrel. However, mean healthcare costs at a one year horizon (excluding drug costs and concomitant drugs) were 690 PLN higher for patients treated with clopidogrel. In a lifetime horizon
Detection of advance item knowledge using response times in computer adaptive testing

NARCIS (Netherlands)

Meijer, R.R.; Sotaridona, Leonardo

2006-01-01

We propose a new method for detecting item preknowledge in a CAT based on an estimate of “effective response time” for each item. Effective response time is defined as the time required for an individual examinee to answer an item correctly. An unusually short response time relative to the expected
Testing for Nonuniform Differential Item Functioning with Multiple Indicator Multiple Cause Models

Science.gov (United States)

Woods, Carol M.; Grimm, Kevin J.

2011-01-01

In extant literature, multiple indicator multiple cause (MIMIC) models have been presented for identifying items that display uniform differential item functioning (DIF) only, not nonuniform DIF. This article addresses, for apparently the first time, the use of MIMIC models for testing both uniform and nonuniform DIF with categorical indicators. A…
Analysis Test of Understanding of Vectors with the Three-Parameter Logistic Model of Item Response Theory and Item Response Curves Technique

Science.gov (United States)

Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

2016-01-01

This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming…
Synergies Between the Kepler, K2 and TESS Missions with the PLATO Mission (Revised)

Science.gov (United States)

Jenkins, Jon M.

2017-01-01

Two transit survey missions will have been flown by NASA prior to the launch of ESA's PLATO Mission in 2026, laying the groundwork for exoplanet discovery via the transit method. The Kepler Mission, which launched in 2009, collected data on its 100+ square degree field of view for four years before failure of a reaction wheel ended its primary mission. The results from Kepler include 2300+ confirmed or validated exoplanets, 2200+ planetary candidates, 2100+ eclipsing binaries. Kepler also revolutionized the field of asteroseismology by measuring the pressure mode oscillations of over 15000 solar-like stars spanning the lifecycle of such stars from hydrogen-burning dwarfs to helium-burning red giants. The re-purposed Kepler Mission, dubbed K2, continues to observe fields of view in and near the ecliptic plane for 80 days each, significantly broadening the scope of the astrophysical investigations as well as discovering an additional 156 exoplanets to date. The TESS mission will launch in 2017 to conduct an all-sky survey for small exoplanets orbiting stars 10X closer and 100X brighter than Kepler exoplanet host stars, allowing for far greater follow-up and characterization of their masses as well as their sizes for at least 50 small planets. Future assets such as James Webb Space Telescope, and ground-based assets such as ESOs Very Large Telescope (VLT) array, the Exremely Large Telescope (ELT), and the Thirty Meter Telescope (TMT) will be able to characterize the atmospheric composition and properties of these small planets. TESS will observe each 24 X 96 field of view for 30 days and thereby cover first the southern and then the northern hemisphere over 13 pointings during each year of the primary mission. The pole-most camera will observe the James Webb continuous viewing zone for one year in each hemisphere, permitting much longer period planets to be detected in this region. The PLATO mission will seek to detect habitable Earth-like planets with an instrument
IRT-Estimated Reliability for Tests Containing Mixed Item Formats

Science.gov (United States)

Shu, Lianghua; Schwarz, Richard D.

2014-01-01

As a global measure of precision, item response theory (IRT) estimated reliability is derived for four coefficients (Cronbach's a, Feldt-Raju, stratified a, and marginal reliability). Models with different underlying assumptions concerning test-part similarity are discussed. A detailed computational example is presented for the targeted…
Applications of NLP Techniques to Computer-Assisted Authoring of Test Items for Elementary Chinese

Science.gov (United States)

Liu, Chao-Lin; Lin, Jen-Hsiang; Wang, Yu-Chun

2010-01-01

The authors report an implemented environment for computer-assisted authoring of test items and provide a brief discussion about the applications of NLP techniques for computer assisted language learning. Test items can serve as a tool for language learners to examine their competence in the target language. The authors apply techniques for…
Dialectic of Eros and Myth of the Soul in Plato's Phaedrus

DEFF Research Database (Denmark)

Larsen, Jens Kristian

2010-01-01

In this paper, I question a widespread reading of a passage in the last part of the Phaedrus dealing with the science of dialectic. According to this reading, the passage announces a new method peculiar to the later Plato aiming at defining natural kinds. I show that the Phaedrus itself does not ...... not support such a reading. As an alternative reading, I suggest that the science of dialectic, as discussed in the passage, must be seen as dealing primarily with philosophical rhetoric and knowledge of human souls....

Testlet-Based Multidimensional Adaptive Testing.

Science.gov (United States)

Frey, Andreas; Seitz, Nicki-Nils; Brandt, Steffen

2016-01-01

Multidimensional adaptive testing (MAT) is a highly efficient method for the simultaneous measurement of several latent traits. Currently, no psychometrically sound approach is available for the use of MAT in testlet-based tests. Testlets are sets of items sharing a common stimulus such as a graph or a text. They are frequently used in large operational testing programs like TOEFL, PISA, PIRLS, or NAEP. To make MAT accessible for such testing programs, we present a novel combination of MAT with a multidimensional generalization of the random effects testlet model (MAT-MTIRT). MAT-MTIRT compared to non-adaptive testing is examined for several combinations of testlet effect variances (0.0, 0.5, 1.0, and 1.5) and testlet sizes (3, 6, and 9 items) with a simulation study considering three ability dimensions with simple loading structure. MAT-MTIRT outperformed non-adaptive testing regarding the measurement precision of the ability estimates. Further, the measurement precision decreased when testlet effect variances and testlet sizes increased. The suggested combination of the MTIRT model therefore provides a solution to the substantial problems of testlet-based tests while keeping the length of the test within an acceptable range.
Redefining diagnostic symptoms of depression using Rasch analysis: testing an item bank suitable for DSM-V and computer adaptive testing.

Science.gov (United States)

Mitchell, Alex J; Smith, Adam B; Al-salihy, Zerak; Rahim, Twana A; Mahmud, Mahmud Q; Muhyaldin, Asma S

2011-10-01

We aimed to redefine the optimal self-report symptoms of depression suitable for creation of an item bank that could be used in computer adaptive testing or to develop a simplified screening tool for DSM-V. Four hundred subjects (200 patients with primary depression and 200 non-depressed subjects), living in Iraqi Kurdistan were interviewed. The Mini International Neuropsychiatric Interview (MINI) was used to define the presence of major depression (DSM-IV criteria). We examined symptoms of depression using four well-known scales delivered in Kurdish. The Partial Credit Model was applied to each instrument. Common-item equating was subsequently used to create an item bank and differential item functioning (DIF) explored for known subgroups. A symptom level Rasch analysis reduced the original 45 items to 24 items of the original after the exclusion of 21 misfitting items. A further six items (CESD13 and CESD17, HADS-D4, HADS-D5 and HADS-D7, and CDSS3 and CDSS4) were removed due to misfit as the items were added together to form the item bank, and two items were subsequently removed following the DIF analysis by diagnosis (CESD20 and CDSS9, both of which were harder to endorse for women). Therefore the remaining optimal item bank consisted of 17 items and produced an area under the curve (AUC) of 0.987. Using a bank restricted to the optimal nine items revealed only minor loss of accuracy (AUC = 0.989, sensitivity 96%, specificity 95%). Finally, when restricted to only four items accuracy was still high (AUC was still 0.976; sensitivity 93%, specificity 96%). An item bank of 17 items may be useful in computer adaptive testing and nine or even four items may be used to develop a simplified screening tool for DSM-V major depressive disorder (MDD). Further examination of this item bank should be conducted in different cultural settings.
Relationships among Classical Test Theory and Item Response Theory Frameworks via Factor Analytic Models

Science.gov (United States)

Kohli, Nidhi; Koran, Jennifer; Henn, Lisa

2015-01-01

There are well-defined theoretical differences between the classical test theory (CTT) and item response theory (IRT) frameworks. It is understood that in the CTT framework, person and item statistics are test- and sample-dependent. This is not the perception with IRT. For this reason, the IRT framework is considered to be theoretically superior…
Advanced Marketing Core Curriculum. Test Items and Assessment Techniques.

Science.gov (United States)

Smith, Clifton L.; And Others

This document contains duties and tasks, multiple-choice test items, and other assessment techniques for Missouri's advanced marketing core curriculum. The core curriculum begins with a list of 13 suggested textbook resources. Next, nine duties with their associated tasks are given. Under each task appears one or more citations to appropriate…
Using Classical Test Theory and Item Response Theory to Evaluate the LSCI

Science.gov (United States)

Schlingman, Wayne M.; Prather, E. E.; Collaboration of Astronomy Teaching Scholars CATS

2011-01-01

Analyzing the data from the recent national study using the Light and Spectroscopy Concept Inventory (LSCI), this project uses both Classical Test Theory (CTT) and Item Response Theory (IRT) to investigate the LSCI itself in order to better understand what it is actually measuring. We use Classical Test Theory to form a framework of results that can be used to evaluate the effectiveness of individual questions at measuring differences in student understanding and provide further insight into the prior results presented from this data set. In the second phase of this research, we use Item Response Theory to form a theoretical model that generates parameters accounting for a student's ability, a question's difficulty, and estimate the level of guessing. The combined results from our investigations using both CTT and IRT are used to better understand the learning that is taking place in classrooms across the country. The analysis will also allow us to evaluate the effectiveness of individual questions and determine whether the item difficulties are appropriately matched to the abilities of the students in our data set. These results may require that some questions be revised, motivating the need for further development of the LSCI. This material is based upon work supported by the National Science Foundation under Grant No. 0715517, a CCLI Phase III Grant for the Collaboration of Astronomy Teaching Scholars (CATS). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Collaborative Filtering Based on Sequential Extraction of User-Item Clusters

Science.gov (United States)

Honda, Katsuhiro; Notsu, Akira; Ichihashi, Hidetomo

Collaborative filtering is a computational realization of “word-of-mouth” in network community, in which the items prefered by “neighbors” are recommended. This paper proposes a new item-selection model for extracting user-item clusters from rectangular relation matrices, in which mutual relations between users and items are denoted in an alternative process of “liking or not”. A technique for sequential co-cluster extraction from rectangular relational data is given by combining the structural balancing-based user-item clustering method with sequential fuzzy cluster extraction appraoch. Then, the tecunique is applied to the collaborative filtering problem, in which some items may be shared by several user clusters.
One aspect of the methodology of cognition in Plato and Dionysius the Areopagite

Directory of Open Access Journals (Sweden)

Moiseev, Petr

2008-06-01

Full Text Available Petr Moiseev (Perm State Institute of Arts and Cultureshows how the concept of ascension to truth, first formulated by Plato, was later reworked and reevaluated in new cognitive context by such later thinkers, as Plutarch, Iamblichus and, finally, Pseudo-Dionysius the Areopagite. Special attention is given to the concept of knowledge beyond human cognition and the role symbolism played in the process of its development.
Clockwise rotation of the Santa Marta massif and simultaneous Paleogene to Neogene deformation of the Plato-San Jorge and Cesar-Ranchería basins

Science.gov (United States)

Montes, Camilo; Guzman, Georgina; Bayona, German; Cardona, Agustin; Valencia, Victor; Jaramillo, Carlos

2010-10-01

A moderate amount of vertical-axis clockwise rotation of the Santa Marta massif (30°) explains as much as 115 km of extension (stretching of 1.75) along its trailing edge (Plato-San Jorge basin) and up to 56 km of simultaneous shortening with an angular shear of 0.57 along its leading edge (Perijá range). Extensional deformation is recorded in the 260 km-wide, fan-shaped Plato-San Jorge basin by a 2-8 km thick, shallowing-upward and almost entirely fine-grained, upper Eocene and younger sedimentary sequence. The simultaneous initiation of shortening in the Cesar-Ranchería basin is documented by Mesozoic strata placed on to lower Eocene syntectonic strata (Tabaco Formation and equivalents) along the northwest-verging, shallow dipping (9-12° to the southeast) and discrete Cerrejón thrust. First-order subsidence analysis in the Plato-San Jorge basin is consistent with crustal stretching values between 1.5 and 2, also predicted by the rigid-body rotation of the Santa Marta massif. The model predicts about 100 km of right-lateral displacement along the Oca fault and 45 km of left-lateral displacement along the Santa Marta-Bucaramanga fault. Clockwise rotation of a rigid Santa Marta massif, and simultaneous Paleogene opening of the Plato-San Jorge basin and emplacement of the Cerrejón thrust sheet would have resulted in the fragmentation of the Cordillera Central-Santa Marta massif province. New U/Pb ages (241 ± 3 Ma) on granitoid rocks from industry boreholes in the Plato-San Jorge basin confirm the presence of fragments of a now segmented, Late Permian to Early Triassic age, two-mica, granitic province that once spanned the Santa Marta massif to the northernmost Cordillera Central.
Development of an item bank for computerized adaptive test (CAT) measurement of pain

DEFF Research Database (Denmark)

Petersen, Morten Aa.; Aaronson, Neil K; Chie, Wei-Chu

2016-01-01

PURPOSE: Patient-reported outcomes should ideally be adapted to the individual patient while maintaining comparability of scores across patients. This is achievable using computerized adaptive testing (CAT). The aim here was to develop an item bank for CAT measurement of the pain domain as measured...... were obtained from 1103 cancer patients from five countries. Psychometric evaluations showed that 16 items could be retained in a unidimensional item bank. Evaluations indicated that use of the CAT measure may reduce sample size requirements with 15-25 % compared to using the QLQ-C30 pain scale....... CONCLUSIONS: We have established an item bank of 16 items suitable for CAT measurement of pain. While being backward compatible with the QLQ-C30, the new item bank will significantly improve measurement precision of pain. We recommend initiating CAT measurement by screening for pain using the two original QLQ...
Understanding and quantifying cognitive complexity level in mathematical problem solving items

Directory of Open Access Journals (Sweden)

SUSAN E. EMBRETSON

2008-09-01

Full Text Available The linear logistic test model (LLTM; Fischer, 1973 has been applied to a wide variety of new tests. When the LLTM application involves item complexity variables that are both theoretically interesting and empirically supported, several advantages can result. These advantages include elaborating construct validity at the item level, defining variables for test design, predicting parameters of new items, item banking by sources of complexity and providing a basis for item design and item generation. However, despite the many advantages of applying LLTM to test items, it has been applied less often to understand the sources of complexity for large-scale operational test items. Instead, previously calibrated item parameters are modeled using regression techniques because raw item response data often cannot be made available. In the current study, both LLTM and regression modeling are applied to mathematical problem solving items from a widely used test. The findings from the two methods are compared and contrasted for their implications for continued development of ability and achievement tests based on mathematical problem solving items.
A Feedback Control Strategy for Enhancing Item Selection Efficiency in Computerized Adaptive Testing

Science.gov (United States)

Weissman, Alexander

2006-01-01

A computerized adaptive test (CAT) may be modeled as a closed-loop system, where item selection is influenced by trait level ([theta]) estimation and vice versa. When discrepancies exist between an examinee's estimated and true [theta] levels, nonoptimal item selection is a likely result. Nevertheless, examinee response behavior consistent with…
Australian Biology Test Item Bank, Years 11 and 12. Volume II: Year 12.

Science.gov (United States)

Brown, David W., Ed.; Sewell, Jeffrey J., Ed.

This document consists of test items which are applicable to biology courses throughout Australia (irrespective of course materials used); assess key concepts within course statement (for both core and optional studies); assess a wide range of cognitive processes; and are relevant to current biological concepts. These items are arranged under…
Australian Biology Test Item Bank, Years 11 and 12. Volume I: Year 11.

Science.gov (United States)

Brown, David W., Ed.; Sewell, Jeffrey J., Ed.

This document consists of test items which are applicable to biology courses throughout Australia (irrespective of course materials used); assess key concepts within course statement (for both core and optional studies); assess a wide range of cognitive processes; and are relevant to current biological concepts. These items are arranged under…
Do Self Concept Tests Test Self Concept? An Evaluation of the Validity of Items on the Piers Harris and Coopersmith Measures.

Science.gov (United States)

Lynch, Mervin D.; Chaves, John

Items from Peirs-Harris and Coopersmith self-concept tests were evaluated against independent measures on three self-constructs, idealized, empathic, and worth. Construct measurements were obtained with the semantic differential and D statistic. Ratings were obtained from 381 children, grades 4-6. For each test, item ratings and construct measures…
Higher-Order Asymptotics and Its Application to Testing the Equality of the Examinee Ability Over Two Sets of Items.

Science.gov (United States)

Sinharay, Sandip; Jensen, Jens Ledet

2018-06-27

In educational and psychological measurement, researchers and/or practitioners are often interested in examining whether the ability of an examinee is the same over two sets of items. Such problems can arise in measurement of change, detection of cheating on unproctored tests, erasure analysis, detection of item preknowledge, etc. Traditional frequentist approaches that are used in such problems include the Wald test, the likelihood ratio test, and the score test (e.g., Fischer, Appl Psychol Meas 27:3-26, 2003; Finkelman, Weiss, & Kim-Kang, Appl Psychol Meas 34:238-254, 2010; Glas & Dagohoy, Psychometrika 72:159-180, 2007; Guo & Drasgow, Int J Sel Assess 18:351-364, 2010; Klauer & Rettig, Br J Math Stat Psychol 43:193-206, 1990; Sinharay, J Educ Behav Stat 42:46-68, 2017). This paper shows that approaches based on higher-order asymptotics (e.g., Barndorff-Nielsen & Cox, Inference and asymptotics. Springer, London, 1994; Ghosh, Higher order asymptotics. Institute of Mathematical Statistics, Hayward, 1994) can also be used to test for the equality of the examinee ability over two sets of items. The modified signed likelihood ratio test (e.g., Barndorff-Nielsen, Biometrika 73:307-322, 1986) and the Lugannani-Rice approximation (Lugannani & Rice, Adv Appl Prob 12:475-490, 1980), both of which are based on higher-order asymptotics, are shown to provide some improvement over the traditional frequentist approaches in three simulations. Two real data examples are also provided.
Why Students Answer TIMSS Science Test Items the Way They Do

Science.gov (United States)

Harlow, Ann; Jones, Alister

2004-04-01

The purpose of this study was to explore how Year 8 students answered Third International Mathematics and Science Study (TIMSS) questions and whether the test questions represented the scientific understanding of these students. One hundred and seventy-seven students were tested using written test questions taken from the science test used in the Third International Mathematics and Science Study. The degree to which a sample of 38 children represented their understanding of the topics in a written test compared to the level of understanding that could be elicited by an interview is presented in this paper. In exploring student responses in the interview situation this study hoped to gain some insight into the science knowledge that students held and whether or not the test items had been able to elicit this knowledge successfully. We question the usefulness and quality of data from large-scale summative assessments on their own to represent student scientific understanding and conclude that large scale written test items, such as TIMSS, on their own are not a valid way of exploring students'' understanding of scientific concepts. Considerable caution is therefore needed in exploiting the outcomes of international achievement testing when considering educational policy changes or using TIMSS data on their own to represent student understanding.
Strategies for Controlling Item Exposure in Computerized Adaptive Testing with the Generalized Partial Credit Model

Science.gov (United States)

Davis, Laurie Laughlin

2004-01-01

Choosing a strategy for controlling item exposure has become an integral part of test development for computerized adaptive testing (CAT). This study investigated the performance of six procedures for controlling item exposure in a series of simulated CATs under the generalized partial credit model. In addition to a no-exposure control baseline…
Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement.

Science.gov (United States)

McInnes, Matthew D F; Moher, David; Thombs, Brett D; McGrath, Trevor A; Bossuyt, Patrick M; Clifford, Tammy; Cohen, Jérémie F; Deeks, Jonathan J; Gatsonis, Constantine; Hooft, Lotty; Hunt, Harriet A; Hyde, Christopher J; Korevaar, Daniël A; Leeflang, Mariska M G; Macaskill, Petra; Reitsma, Johannes B; Rodin, Rachel; Rutjes, Anne W S; Salameh, Jean-Paul; Stevens, Adrienne; Takwoingi, Yemisi; Tonelli, Marcello; Weeks, Laura; Whiting, Penny; Willis, Brian H

2018-01-23

Systematic reviews of diagnostic test accuracy synthesize data from primary diagnostic studies that have evaluated the accuracy of 1 or more index tests against a reference standard, provide estimates of test performance, allow comparisons of the accuracy of different tests, and facilitate the identification of sources of variability in test accuracy. To develop the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) diagnostic test accuracy guideline as a stand-alone extension of the PRISMA statement. Modifications to the PRISMA statement reflect the specific requirements for reporting of systematic reviews and meta-analyses of diagnostic test accuracy studies and the abstracts for these reviews. Established standards from the Enhancing the Quality and Transparency of Health Research (EQUATOR) Network were followed for the development of the guideline. The original PRISMA statement was used as a framework on which to modify and add items. A group of 24 multidisciplinary experts used a systematic review of articles on existing reporting guidelines and methods, a 3-round Delphi process, a consensus meeting, pilot testing, and iterative refinement to develop the PRISMA diagnostic test accuracy guideline. The final version of the PRISMA diagnostic test accuracy guideline checklist was approved by the group. The systematic review (produced 64 items) and the Delphi process (provided feedback on 7 proposed items; 1 item was later split into 2 items) identified 71 potentially relevant items for consideration. The Delphi process reduced these to 60 items that were discussed at the consensus meeting. Following the meeting, pilot testing and iterative feedback were used to generate the 27-item PRISMA diagnostic test accuracy checklist. To reflect specific or optimal contemporary systematic review methods for diagnostic test accuracy, 8 of the 27 original PRISMA items were left unchanged, 17 were modified, 2 were added, and 2 were omitted. The 27-item
International Semiotics: Item Difficulty and the Complexity of Science Item Illustrations in the PISA-2009 International Test Comparison

Science.gov (United States)

Solano-Flores, Guillermo; Wang, Chao; Shade, Chelsey

2016-01-01

We examined multimodality (the representation of information in multiple semiotic modes) in the context of international test comparisons. Using Program of International Student Assessment (PISA)-2009 data, we examined the correlation of the difficulty of science items and the complexity of their illustrations. We observed statistically…
Item level diagnostics and model - data fit in item response theory ...

African Journals Online (AJOL)

Item response theory (IRT) is a framework for modeling and analyzing item response data. Item-level modeling gives IRT advantages over classical test theory. The fit of an item score pattern to an item response theory (IRT) models is a necessary condition that must be assessed for further use of item and models that best fit ...

The Technical Quality of Test Items Generated Using a Systematic Approach to Item Writing.

Science.gov (United States)

Siskind, Theresa G.; Anderson, Lorin W.

The study was designed to examine the similarity of response options generated by different item writers using a systematic approach to item writing. The similarity of response options to student responses for the same item stems presented in an open-ended format was also examined. A non-systematic (subject matter expertise) approach and a…
Evaluation of psychometric properties and differential item functioning of 8-item Child Perceptions Questionnaires using item response theory.

Science.gov (United States)

Yau, David T W; Wong, May C M; Lam, K F; McGrath, Colman

2015-08-19

Four-factor structure of the two 8-item short forms of Child Perceptions Questionnaire CPQ11-14 (RSF:8 and ISF:8) has been confirmed. However, the sum scores are typically reported in practice as a proxy of Oral health-related Quality of Life (OHRQoL), which implied a unidimensional structure. This study first assessed the unidimensionality of 8-item short forms of CPQ11-14. Item response theory (IRT) was employed to offer an alternative and complementary approach of validation and to overcome the limitations of classical test theory assumptions. A random sample of 649 12-year-old school children in Hong Kong was analyzed. Unidimensionality of the scale was tested by confirmatory factor analysis (CFA), principle component analysis (PCA) and local dependency (LD) statistic. Graded response model was fitted to the data. Contribution of each item to the scale was assessed by item information function (IIF). Reliability of the scale was assessed by test information function (TIF). Differential item functioning (DIF) across gender was identified by Wald test and expected score functions. Both CPQ11-14 RSF:8 and ISF:8 did not deviate much from the unidimensionality assumption. Results from CFA indicated acceptable fit of the one-factor model. PCA indicated that the first principle component explained >30 % of the total variation with high factor loadings for both RSF:8 and ISF:8. Almost all LD statistic items suggesting little contribution of information to the scale and item removal caused little practical impact. Comparing the TIFs, RSF:8 showed slightly better information than ISF:8. In addition to oral symptoms items, the item "Concerned with what other people think" demonstrated a uniform DIF (p Items related to oral symptoms were not informative to OHRQoL and deletion of these items is suggested. The impact of DIF across gender on the overall score was minimal. CPQ11-14 RSF:8 performed slightly better than ISF:8 in measurement precision. The 6-item short forms
Developing and testing items for the South African Personality Inventory (SAPI

Directory of Open Access Journals (Sweden)

Carin Hill

2013-11-01

Research purpose: This article reports on the process of identifying items for, and provides a quantitative evaluation of, the South African Personality Inventory (SAPI items. Motivation for the study: The study intended to develop an indigenous and psychometrically sound personality instrument that adheres to the requirements of South African legislation and excludes cultural bias. Research design, approach and method: The authors used a cross-sectional design. They measured the nine SAPI clusters identified in the qualitative stage of the SAPI project in 11 separate quantitative studies. Convenience sampling yielded 6735 participants. Statistical analysis focused on the construct validity and reliability of items. The authors eliminated items that showed poor performance, based on common psychometric criteria, and selected the best performing items to form part of the final version of the SAPI. Main findings: The authors developed 2573 items from the nine SAPI clusters. Of these, 2268 items were valid and reliable representations of the SAPI facets. Practical/managerial implications: The authors developed a large item pool. It measures personality in South Africa. Researchers can refine it for the SAPI. Furthermore, the project illustrates an approach that researchers can use in projects that aim to develop culturally-informed psychological measures. Contribution/value-add: Personality assessment is important for recruiting, selecting and developing employees. This study contributes to the current knowledge about the early processes researchers follow when they develop a personality instrument that measures personality fairly in different cultural groups, as the SAPI does.
Easy and Informative: Using Confidence-Weighted True-False Items for Knowledge Tests in Psychology Courses

Science.gov (United States)

Dutke, Stephan; Barenberg, Jonathan

2015-01-01

We introduce a specific type of item for knowledge tests, confidence-weighted true-false (CTF) items, and review experiences of its application in psychology courses. A CTF item is a statement about the learning content to which students respond whether the statement is true or false, and they rate their confidence level. Previous studies using…
The Linear Logistic Test Model (LLTM as the methodological foundation of item generating rules for a new verbal reasoning test

Directory of Open Access Journals (Sweden)

HERBERT POINSTINGL

2009-06-01

Full Text Available Based on the demand for new verbal reasoning tests to enrich psychological test inventory, a pilot version of a new test was analysed: the 'Family Relation Reasoning Test' (FRRT; Poinstingl, Kubinger, Skoda & Schechtner, forthcoming, in which several basic cognitive operations (logical rules have been embedded/implemented. Given family relationships of varying complexity embedded in short stories, testees had to logically conclude the correct relationship between two individuals within a family. Using empirical data, the linear logistic test model (LLTM; Fischer, 1972, a special case of the Rasch model, was used to test the construct validity of the test: The hypothetically assumed basic cognitive operations had to explain the Rasch model's item difficulty parameters. After being shaped in LLTM's matrices of weights ((qij, none of these operations were corroborated by means of the Andersen's Likelihood Ratio Test.
Use of differential item functioning (DIF analysis for bias analysis in test construction

Directory of Open Access Journals (Sweden)

Marié De Beer

2004-10-01

Opsomming Waar differensiële itemfunksioneringsprosedures (DIF-prosedures vir itemontleding gebaseer op itemresponsteorie (IRT tydens toetskonstruksie gebruik word, is dit moontlik om itemkarakteristiekekrommes vir dieselfde item vir verskillende subgroepe voor te stel. Hierdie krommes dui aan hoe elke item vir die verskillende subgroepe op verskillende vermoënsvlakke te funksioneer. DIF word aangetoon deur die area tussen die krommes. DIF is in die konstruksie van die 'Learning Potential Computerised Adaptive test (LPCAT' gebruik om die items te identifiseer wat sydigheid ten opsigte van geslag, kultuur, taal of opleidingspeil geopenbaar het. Items wat ’n voorafbepaalde vlak van DIF oorskry het, is uit die finale itembank weggelaat, ongeag die subgroep wat bevoordeel of benadeel is. Die proses en resultate van die DIF-ontleding word bespreek.
Numbers Rule The Vexing Mathematics of Democracy, from Plato to the Present

CERN Document Server

Szpiro, George G

2010-01-01

Since the very birth of democracy in ancient Greece, the simple act of voting has given rise to mathematical paradoxes that have puzzled some of the greatest philosophers, statesmen, and mathematicians. Numbers Rule traces the epic quest by these thinkers to create a more perfect democracy and adapt to the ever-changing demands that each new generation places on our democratic institutions. In a sweeping narrative that combines history, biography, and mathematics, George Szpiro details the fascinating lives and big ideas of great minds such as Plato, Pliny the Younger, Ramon Llull, Pierre Simo
Explanatory item response modelling of an abstract reasoning assessment: A case for modern test design

OpenAIRE

Helland, Fredrik

2016-01-01

Assessment is an integral part of society and education, and for this reason it is important to know what you measure. This thesis is about explanatory item response modelling of an abstract reasoning assessment, with the objective to create a modern test design framework for automatic generation of valid and precalibrated items of abstract reasoning. Modern test design aims to strengthen the connections between the different components of a test, with a stress on strong theory, systematic it...
P2-18: Temporal and Featural Separation of Memory Items Play Little Role for VSTM-Based Change Detection

Directory of Open Access Journals (Sweden)

Dae-Gyu Kim

2012-10-01

Full Text Available Classic studies of visual short-term memory (VSTM found that presenting memory items either sequentially or simultaneously does not affect recognition accuracy of the remembered items. Other studies also suggest that capacity of VSTM benefits from formation of bound object-based representations leading to no cost of remembering multi-feature items. According to these ideas, we aimed to examine the role of temporal and featural separation of memory items in VSTM change detection, (1 if sample items are separated across different temporal moments and (2 if across different feature dimensions. In a series of change detection experiments, we asked participants to report a change between a sample and a test display with a brief delay in between. In experiment 1, the sample items were split into two sets with a different onset time. In experiment 2, the sample items were split across two different feature dimensions (e.g., half color and half orientation. The change detection accuracy in Experiment 1 showed no substantial drop when the memory items were separated into two onset groups compared to simultaneous onset. The accuracy did not drop either when the features of sample items were split across two different feature groups compared to when were not split. The results indicate that temporal and featural separation of VWM items does not play a significant role for VSTM-based change detection.
A Comparison of Item Selection Procedures Using Different Ability Estimation Methods in Computerized Adaptive Testing Based on the Generalized Partial Credit Model

Science.gov (United States)

Ho, Tsung-Han

2010-01-01

Computerized adaptive testing (CAT) provides a highly efficient alternative to the paper-and-pencil test. By selecting items that match examinees' ability levels, CAT not only can shorten test length and administration time but it can also increase measurement precision and reduce measurement error. In CAT, maximum information (MI) is the most…
Compreensão da leitura: análise do funcionamento diferencial dos itens de um Teste de Cloze Reading comprehension: differential item functioning analysis of a Cloze Test

Directory of Open Access Journals (Sweden)

Katya Luciane Oliveira

2012-01-01

Full Text Available Este estudo teve por objetivos investigar o ajuste de um Teste de Cloze ao modelo Rasch e avaliar a dificuldade na resposta ao item em razão do gênero das pessoas (DIF. Participaram da pesquisa 573 alunos das 5ª a 8ª séries do ensino fundamental de escolas públicas estaduais dos estados de São Paulo e Minas Gerais. O teste de Cloze foi aplicado de forma coletiva. A análise do instrumento evidenciou um bom ajuste ao modelo Rasch, bem como os itens foram respondidos conforme o padrão esperado, demonstrando um bom ajuste, também. Quanto ao DIF, apenas três itens indicaram diferenciar o gênero. Com base nos dados, identificou-se que houve equilíbrio nas respostas dadas pelos meninos e meninas.The objectives of the present study were to investigate the adaptation of a Cloze test to the Rasch Model as well as to evaluate the Differential Item Functioning (DIF in relation to gender. The sample was composed by 573 students from 5th to 8th grades of public schools in the state of São Paulo. The cloze test was applied collectively. The analysis of the instrument revealed its adaptation to Rash Model and that the items were responded according to the expected pattern, showing good adjustment, as well. Regarding DIF, only three items were differentiated by gender. Based on the data, results indicated a balance in the answers given by boys and girls.
Evaluation of Item-Based Top-N Recommendation Algorithms

Science.gov (United States)

2000-09-15

Furthermore, one of the advantages of the item-based algorithm is that it has much smaller computational require- 11 0.0 0.1 0.2 0.3 0.4 0.5 0.6 ecommerce ...items, utilized by many e-commerce sites, cannot take advantage of pre-computed user-to-user similarities. Consequently, even though the throughput of...Non-Zeros ecommerce 6667 17491 91222 catalog 50918 39080 435524 ccard 42629 68793 398619 skills 4374 2125 82612 movielens 943 1682 100000 Table 1: The
Item Purification Does Not Always Improve DIF Detection: A Counterexample with Angoff's Delta Plot

Science.gov (United States)

Magis, David; Facon, Bruno

2013-01-01

Item purification is an iterative process that is often advocated as improving the identification of items affected by differential item functioning (DIF). With test-score-based DIF detection methods, item purification iteratively removes the items currently flagged as DIF from the test scores to get purified sets of items, unaffected by DIF. The…
Monte Carlo tests of the Rasch model based on scalability coefficients

DEFF Research Database (Denmark)

Christensen, Karl Bang; Kreiner, Svend

2010-01-01

that summarizes the number of Guttman errors in the data matrix. These coefficients are shown to yield efficient tests of the Rasch model using p-values computed using Markov chain Monte Carlo methods. The power of the tests of unequal item discrimination, and their ability to distinguish between local dependence......For item responses fitting the Rasch model, the assumptions underlying the Mokken model of double monotonicity are met. This makes non-parametric item response theory a natural starting-point for Rasch item analysis. This paper studies scalability coefficients based on Loevinger's H coefficient...
Evaluation of item candidates for a diabetic retinopathy quality of life item bank.

Science.gov (United States)

Fenwick, Eva K; Pesudovs, Konrad; Khadka, Jyoti; Rees, Gwyn; Wong, Tien Y; Lamoureux, Ecosse L

2013-09-01

We are developing an item bank assessing the impact of diabetic retinopathy (DR) on quality of life (QoL) using a rigorous multi-staged process combining qualitative and quantitative methods. We describe here the first two qualitative phases: content development and item evaluation. After a comprehensive literature review, items were generated from four sources: (1) 34 previously validated patient-reported outcome measures; (2) five published qualitative articles; (3) eight focus groups and 18 semi-structured interviews with 57 DR patients; and (4) seven semi-structured interviews with diabetes or ophthalmic experts. Items were then evaluated during 3 stages, namely binning (grouping) and winnowing (reduction) based on key criteria and panel consensus; development of item stems and response options; and pre-testing of items via cognitive interviews with patients. The content development phase yielded 1,165 unique items across 7 QoL domains. After 3 sessions of binning and winnowing, items were reduced to a minimally representative set (n = 312) across 9 domains of QoL: visual symptoms; ocular surface symptoms; activity limitation; mobility; emotional; health concerns; social; convenience; and economic. After 8 cognitive interviews, 42 items were amended resulting in a final set of 314 items. We have employed a systematic approach to develop items for a DR-specific QoL item bank. The psychometric properties of the nine QoL subscales will be assessed using Rasch analysis. The resulting validated item bank will allow clinicians and researchers to better understand the QoL impact of DR and DR therapies from the patient's perspective.
The Effect of Error in Item Parameter Estimates on the Test Response Function Method of Linking.

Science.gov (United States)

Kaskowitz, Gary S.; De Ayala, R. J.

2001-01-01

Studied the effect of item parameter estimation for computation of linking coefficients for the test response function (TRF) linking/equating method. Simulation results showed that linking was more accurate when there was less error in the parameter estimates, and that 15 or 25 common items provided better results than 5 common items under both…
The Relative Importance of Persons, Items, Subtests, and Languages to TOEFL Test Variance.

Science.gov (United States)

Brown, James Dean

1999-01-01

Explored the relative contributions to Test of English as a Foreign Language (TOEFL) score dependability of various numbers of persons, items, subtests, languages, and their various interactions. Sampled 15,000 test takers, 1000 each from 15 different language backgrounds. (Author/VWL)
A Method for Generating Educational Test Items That Are Aligned to the Common Core State Standards

Science.gov (United States)

Gierl, Mark J.; Lai, Hollis; Hogan, James B.; Matovinovic, Donna

2015-01-01

The demand for test items far outstrips the current supply. This increased demand can be attributed, in part, to the transition to computerized testing, but, it is also linked to dramatic changes in how 21st century educational assessments are designed and administered. One way to address this growing demand is with automatic item generation.…
A Comparison of Procedures for Content-Sensitive Item Selection in Computerized Adaptive Tests.

Science.gov (United States)

Kingsbury, G. Gage; Zara, Anthony R.

1991-01-01

This simulation investigated two procedures that reduce differences between paper-and-pencil testing and computerized adaptive testing (CAT) by making CAT content sensitive. Results indicate that the price in terms of additional test items of using constrained CAT for content balancing is much smaller than that of using testlets. (SLD)
An emotional functioning item bank of 24 items for computerized adaptive testing (CAT) was established

DEFF Research Database (Denmark)

Petersen, Morten Aa.; Gamper, Eva-Maria; Costantini, Anna

2016-01-01

of the widely used EORTC Quality of Life questionnaire (QLQ-C30). STUDY DESIGN AND SETTING: On the basis of literature search and evaluations by international samples of experts and cancer patients, 38 candidate items were developed. The psychometric properties of the items were evaluated in a large...... international sample of cancer patients. This included evaluations of dimensionality, item response theory (IRT) model fit, differential item functioning (DIF), and of measurement precision/statistical power. RESULTS: Responses were obtained from 1,023 cancer patients from four countries. The evaluations showed...... that 24 items could be included in a unidimensional IRT model. DIF did not seem to have any significant impact on the estimation of EF. Evaluations indicated that the CAT measure may reduce sample size requirements by up to 50% compared to the QLQ-C30 EF scale without reducing power. CONCLUSION...

Developing Item Response Theory-Based Short Forms to Measure the Social Impact of Burn Injuries.

Science.gov (United States)

Marino, Molly E; Dore, Emily C; Ni, Pengsheng; Ryan, Colleen M; Schneider, Jeffrey C; Acton, Amy; Jette, Alan M; Kazis, Lewis E

2018-03-01

To develop self-reported short forms for the Life Impact Burn Recovery Evaluation (LIBRE) Profile. Short forms based on the item parameters of discrimination and average difficulty. A support network for burn survivors, peer support networks, social media, and mailings. Burn survivors (N=601) older than 18 years. Not applicable. The LIBRE Profile. Ten-item short forms were developed to cover the 6 LIBRE Profile scales: Relationships with Family & Friends, Social Interactions, Social Activities, Work & Employment, Romantic Relationships, and Sexual Relationships. Ceiling effects were ≤15% for all scales; floor effects were item bank, computerized adaptive test, and short forms are all scored along the same metric, and therefore scores are comparable regardless of the mode of administration. Copyright © 2017 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Piecewise Polynomial Fitting with Trend Item Removal and Its Application in a Cab Vibration Test

Directory of Open Access Journals (Sweden)

Wu Ren

2018-01-01

Full Text Available The trend item of a long-term vibration signal is difficult to remove. This paper proposes a piecewise integration method to remove trend items. Examples of direct integration without trend item removal, global integration after piecewise polynomial fitting with trend item removal, and direct integration after piecewise polynomial fitting with trend item removal were simulated. The results showed that direct integration of the fitted piecewise polynomial provided greater acceleration and displacement precision than the other two integration methods. A vibration test was then performed on a special equipment cab. The results indicated that direct integration by piecewise polynomial fitting with trend item removal was highly consistent with the measured signal data. However, the direct integration method without trend item removal resulted in signal distortion. The proposed method can help with frequency domain analysis of vibration signals and modal parameter identification for such equipment.
Using Differential Item Functioning Procedures to Explore Sources of Item Difficulty and Group Performance Characteristics.

Science.gov (United States)

Scheuneman, Janice Dowd; Gerritz, Kalle

1990-01-01

Differential item functioning (DIF) methodology for revealing sources of item difficulty and performance characteristics of different groups was explored. A total of 150 Scholastic Aptitude Test items and 132 Graduate Record Examination general test items were analyzed. DIF was evaluated for males and females and Blacks and Whites. (SLD)
RT-based memory detection : Item saliency effects in the single-probe and the multiple-probe protocol

NARCIS (Netherlands)

Verschuere, B.; Kleinberg, B.; Theocharidou, K.

RT-based memory detection may provide an efficient means to assess recognition of concealed information. There is, however, considerable heterogeneity in detection rates, and we explored two potential moderators: item saliency and test protocol. Participants tried to conceal low salient (e.g.,
Test-retest reliability of selected items of Health Behaviour in School-aged Children (HBSC survey questionnaire in Beijing, China

Directory of Open Access Journals (Sweden)

Liu Yang

2010-08-01

Full Text Available Abstract Background Children's health and health behaviour are essential for their development and it is important to obtain abundant and accurate information to understand young people's health and health behaviour. The Health Behaviour in School-aged Children (HBSC study is among the first large-scale international surveys on adolescent health through self-report questionnaires. So far, more than 40 countries in Europe and North America have been involved in the HBSC study. The purpose of this study is to assess the test-retest reliability of selected items in the Chinese version of the HBSC survey questionnaire in a sample of adolescents in Beijing, China. Methods A sample of 95 male and female students aged 11 or 15 years old participated in a test and retest with a three weeks interval. Student Identity numbers of respondents were utilized to permit matching of test-retest questionnaires. 23 items concerning physical activity, sedentary behaviour, sleep and substance use were evaluated by using the percentage of response shifts and the single measure Intraclass Correlation Coefficients (ICC with 95% confidence interval (CI for all respondents and stratified by gender and age. Items on substance use were only evaluated for school children aged 15 years old. Results The percentage of no response shift between test and retest varied from 32% for the item on computer use at weekends to 92% for the three items on smoking. Of all the 23 items evaluated, 6 items (26% showed a moderate reliability, 12 items (52% displayed a substantial reliability and 4 items (17% indicated almost perfect reliability. No gender and age group difference of the test-retest reliability was found except for a few items on sedentary behaviour. Conclusions The overall findings of this study suggest that most selected indicators in the HBSC survey questionnaire have satisfactory test-retest reliability for the students in Beijing. Further test-retest studies in a large
Evaluation of Item-Based Top-N Recommendation Algorithms

National Research Council Canada - National Science Library

Karypis, George

2000-01-01

... items that will be of interest to a certain user. User-based Collaborative filtering is the most successful technology for building recommender systems to date, and is extensively used in many commercial recommender systems...
The Test Matters: The Relationship between Classroom Observation Scores and Teacher Value Added on Multiple Types of Assessment

Science.gov (United States)

Grossman, Pam; Cohen, Julie; Ronfeldt, Matthew; Brown, Lindsay

2014-01-01

In this study, we examined how the relationships between one observation protocol, the Protocol for Language Arts Teaching Observation (PLATO), and value-added measures shift when different tests are used to assess student achievement. Using data from the Measures of Effective Teaching Project, we found that PLATO was more strongly related to the…
Analyzing Item Generation with Natural Language Processing Tools for the "TOEIC"® Listening Test. Research Report. ETS RR-17-52

Science.gov (United States)

Yoon, Su-Youn; Lee, Chong Min; Houghton, Patrick; Lopez, Melissa; Sakano, Jennifer; Loukina, Anastasia; Krovetz, Bob; Lu, Chi; Madani, Nitin

2017-01-01

In this study, we developed assistive tools and resources to support TOEIC® Listening test item generation. There has recently been an increased need for a large pool of items for these tests. This need has, in turn, inspired efforts to increase the efficiency of item generation while maintaining the quality of the created items. We aimed to…
An evaluation of computerized adaptive testing for general psychological distress: combining GHQ-12 and Affectometer-2 in an item bank for public mental health research.

Science.gov (United States)

Stochl, Jan; Böhnke, Jan R; Pickett, Kate E; Croudace, Tim J

2016-05-20

Recent developments in psychometric modeling and technology allow pooling well-validated items from existing instruments into larger item banks and their deployment through methods of computerized adaptive testing (CAT). Use of item response theory-based bifactor methods and integrative data analysis overcomes barriers in cross-instrument comparison. This paper presents the joint calibration of an item bank for researchers keen to investigate population variations in general psychological distress (GPD). Multidimensional item response theory was used on existing health survey data from the Scottish Health Education Population Survey (n = 766) to calibrate an item bank consisting of pooled items from the short common mental disorder screen (GHQ-12) and the Affectometer-2 (a measure of "general happiness"). Computer simulation was used to evaluate usefulness and efficacy of its adaptive administration. A bifactor model capturing variation across a continuum of population distress (while controlling for artefacts due to item wording) was supported. The numbers of items for different required reliabilities in adaptive administration demonstrated promising efficacy of the proposed item bank. Psychometric modeling of the common dimension captured by more than one instrument offers the potential of adaptive testing for GPD using individually sequenced combinations of existing survey items. The potential for linking other item sets with alternative candidate measures of positive mental health is discussed since an optimal item bank may require even more items than these.
A test sheet generating algorithm based on intelligent genetic algorithm and hierarchical planning

Science.gov (United States)

Gu, Peipei; Niu, Zhendong; Chen, Xuting; Chen, Wei

2013-03-01

In recent years, computer-based testing has become an effective method to evaluate students' overall learning progress so that appropriate guiding strategies can be recommended. Research has been done to develop intelligent test assembling systems which can automatically generate test sheets based on given parameters of test items. A good multisubject test sheet depends on not only the quality of the test items but also the construction of the sheet. Effective and efficient construction of test sheets according to multiple subjects and criteria is a challenging problem. In this paper, a multi-subject test sheet generation problem is formulated and a test sheet generating approach based on intelligent genetic algorithm and hierarchical planning (GAHP) is proposed to tackle this problem. The proposed approach utilizes hierarchical planning to simplify the multi-subject testing problem and adopts genetic algorithm to process the layered criteria, enabling the construction of good test sheets according to multiple test item requirements. Experiments are conducted and the results show that the proposed approach is capable of effectively generating multi-subject test sheets that meet specified requirements and achieve good performance.
Development of Test Items Related to Selected Concepts Within the Scheme the Particle Nature of Matter.

Science.gov (United States)

Doran, Rodney L.; Pella, Milton O.

The purpose of this study was to develop tests items with a minimum reading demand for use with pupils at grade levels two through six. An item was judged to be acceptable if the item satisfied at least four of six criteria. Approximately 250 students in grades 2-6 participated in the study. Half of the students were given instruction to develop…
Algorithms for the Construction of Parallel Tests by Zero-One Programming. Project Psychometric Aspects of Item Banking No. 7. Research Report 86-7.

Science.gov (United States)

Boekkooi-Timminga, Ellen

Nine methods for automated test construction are described. All are based on the concepts of information from item response theory. Two general kinds of methods for the construction of parallel tests are presented: (1) sequential test design; and (2) simultaneous test design. Sequential design implies that the tests are constructed one after the…
On social justice: Comparing Paul with Plato, Aristotle and the Stoics

Directory of Open Access Journals (Sweden)

Johan Strijdom

2007-05-01

Full Text Available n “In search of Paul” (2004 Crossan and Reed argue that Paul’s vision and program were essentially in continuity with Jesus’: both opposed, be it in Galilean villages or Roman cities, an unjust imperial system by means of an alternative project of egalitarian, distributive justice. Although Crossan elsewhere demonstrates the deep roots of this concern in the Jewish tradition, he tends to downplay the importance of Greek contributions in this regard. The purpose of this essay will be to offer, in constant dialogue with Crossan (and Reed, a more refined comparison of social justice in Paul on the one hand and Plato, Aristotle and the Stoics on the other. If Paul tried to establish egalitarian and sharing Christian communities under the Roman empire, how do this vision and program compare and contrast with Plato's hierarchical but communal concept of justice, Aristotle’s distributive notion according to merit, and most importantly the Stoics’ argument of “oikeiosis” (i.e., other-concern by concentrical familiarization with the other? Imagine, say Crossan and Reed (CR hereafter in their recent book on Paul, the following dialogue between ourselves and Paul: Do you think, Paul, that all men are created equal and endowed by their Creator with certain inalienable rights? I am not speaking about all men, but about all Christians. But do you think, Paul, that all people should be Christians? Yes, of course,. And do you think, Paul, that all Christians should be equal with one another?Yes, of course. Then do you think, Paul, that it is God’s will for all people to be equal with one another? Well, let me think about that one for a while and, in the meantime, you think about equality in Christ. (CR 2004:234
Item response theory scoring and the detection of curvilinear relationships.

Science.gov (United States)

Carter, Nathan T; Dalal, Dev K; Guan, Li; LoPilato, Alexander C; Withrow, Scott A

2017-03-01

Psychologists are increasingly positing theories of behavior that suggest psychological constructs are curvilinearly related to outcomes. However, results from empirical tests for such curvilinear relations have been mixed. We propose that correctly identifying the response process underlying responses to measures is important for the accuracy of these tests. Indeed, past research has indicated that item responses to many self-report measures follow an ideal point response process-wherein respondents agree only to items that reflect their own standing on the measured variable-as opposed to a dominance process, wherein stronger agreement, regardless of item content, is always indicative of higher standing on the construct. We test whether item response theory (IRT) scoring appropriate for the underlying response process to self-report measures results in more accurate tests for curvilinearity. In 2 simulation studies, we show that, regardless of the underlying response process used to generate the data, using the traditional sum-score generally results in high Type 1 error rates or low power for detecting curvilinearity, depending on the distribution of item locations. With few exceptions, appropriate power and Type 1 error rates are achieved when dominance-based and ideal point-based IRT scoring are correctly used to score dominance and ideal point response data, respectively. We conclude that (a) researchers should be theory-guided when hypothesizing and testing for curvilinear relations; (b) correctly identifying whether responses follow an ideal point versus dominance process, particularly when items are not extreme is critical; and (c) IRT model-based scoring is crucial for accurate tests of curvilinearity. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
WEB-BASED ADAPTIVE TESTING SYSTEM (WATS FOR CLASSIFYING STUDENTS ACADEMIC ABILITY

Directory of Open Access Journals (Sweden)

Jaemu LEE,

2012-08-01

Full Text Available Computer Adaptive Testing (CAT has been highlighted as a promising assessment method to fulfill two testing purposes: estimating student academic ability and classifying student academic level. In this paper, we introduced the Web-based Adaptive Testing System (WATS developed to support a cost effective assessment for classifying students’ ability into different academic levels. Instead of using a traditional paper and pencil test, the WATS is expected to serve as an alternate method to promptly diagnosis and identify underachieving students through Web-based testing. The WATS can also help provide students with appropriate learning contents and necessary academic support in time. In this paper, theoretical background and structure of WATS, item construction process based upon item response theory, and user interfaces of WATS were discussed.
Varying levels of difficulty index of skills-test items randomly selected by examinees on the Korean emergency medical technician licensing examination.

Science.gov (United States)

Koh, Bongyeun; Hong, Sunggi; Kim, Soon-Sim; Hyun, Jin-Sook; Baek, Milye; Moon, Jundong; Kwon, Hayran; Kim, Gyoungyong; Min, Seonggi; Kang, Gu-Hyun

2016-01-01

The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination.
Varying levels of difficulty index of skills-test items randomly selected by examinees on the Korean emergency medical technician licensing examination

Directory of Open Access Journals (Sweden)

Bongyeun Koh

2016-01-01

Full Text Available Purpose: The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE, which requires examinees to select items randomly. Methods: The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. Results: In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01, as well as 4 of the 5 items on the advanced skills test (P<0.05. In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01, as well as all 3 of the advanced skills test items (P<0.01. Conclusion: In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination.
A leukocyte activation test identifies food items which induce release of DNA by innate immune peripheral blood leucocytes.

Science.gov (United States)

Garcia-Martinez, Irma; Weiss, Theresa R; Yousaf, Muhammad N; Ali, Ather; Mehal, Wajahat Z

2018-01-01

Leukocyte activation (LA) testing identifies food items that induce a patient specific cellular response in the immune system, and has recently been shown in a randomized double blinded prospective study to reduce symptoms in patients with irritable bowel syndrome (IBS). We hypothesized that test reactivity to particular food items, and the systemic immune response initiated by these food items, is due to the release of cellular DNA from blood immune cells. We tested this by quantifying total DNA concentration in the cellular supernatant of immune cells exposed to positive and negative foods from 20 healthy volunteers. To establish if the DNA release by positive samples is a specific phenomenon, we quantified myeloperoxidase (MPO) in cellular supernatants. We further assessed if a particular immune cell population (neutrophils, eosinophils, and basophils) was activated by the positive food items by flow cytometry analysis. To identify the signaling pathways that are required for DNA release we tested if specific inhibitors of key signaling pathways could block DNA release. Foods with a positive LA test result gave a higher supernatant DNA content when compared to foods with a negative result. This was specific as MPO levels were not increased by foods with a positive LA test. Protein kinase C (PKC) inhibitors resulted in inhibition of positive food stimulated DNA release. Positive foods resulted in CD63 levels greater than negative foods in eosinophils in 76.5% of tests. LA test identifies food items that result in release of DNA and activation of peripheral blood innate immune cells in a PKC dependent manner, suggesting that this LA test identifies food items that result in release of inflammatory markers and activation of innate immune cells. This may be the basis for the improvement in symptoms in IBS patients who followed an LA test guided diet.
Influence of the wording of evaluation items on outcome-based evaluation results for large-group teaching in anatomy, biochemistry and legal medicine.

Science.gov (United States)

Anders, Sven; Pyka, Katharina; Mueller, Tjark; von Streinbuechel, Nicole; Raupach, Tobias

2016-11-01

Student learning outcome is an important dimension of teaching quality in undergraduate medical education. Measuring an increase in knowledge during teaching requires repetitive objective testing which is usually not feasible. As an alternative, student learning outcome can be calculated from student self-ratings. Comparative self-assessment (CSA) gain reflects the performance difference before and after teaching, adjusted for initial knowledge. It has been shown to be a valid proxy measure of actual learning outcome derived from objective tests. However, student self-ratings are prone to a number of confounding factors. In the context of outcome-based evaluation, the wording of self-rating items is crucial to the validity of evaluation results. This randomized trial assessed whether including qualifiers in these statements impacts on student ratings and CSA gain. First-year medical students self-rated their initial (then-test) and final (post-test) knowledge for lectures in anatomy, biochemistry and legal medicine, respectively, and 659 questionnaires were retrieved. Six-point scales were used for self-ratings with 1 being the most positive option. Qualifier use did not affect then-test ratings but was associated with slightly less favorable post-test ratings. Consecutively, mean CSA gain was smaller for items containing qualifiers than for items lacking qualifiers (50.6±15.0% vs. 56.3±14.6%, p=0.079). The effect was more pronounced (Cohen's d=0.82) for items related to anatomy. In order to increase fairness of outcome-based evaluation and increase the comparability of CSA gain data across subjects, medical educators should agree on a consistent approach (qualifiers for all items or no qualifiers at all) when drafting self-rating statements for outcome-based evaluation. Copyright © 2016 Elsevier GmbH. All rights reserved.
Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions.

Science.gov (United States)

Haberman, Shelby J; Sinharay, Sandip; Chon, Kyong Hee

2013-07-01

Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.

Research of the Occupational Psychological Impact Factors Based on the Frequent Item Mining of the Transactional Database

Directory of Open Access Journals (Sweden)

Cheng Dongmei

2015-01-01

Full Text Available Based on the massive reading of data mining and association rules mining documents, this paper will start from compressing transactional database and propose the frequent complementary item storage structure of the transactional database. According to the previous analysis, this paper will also study the association rules mining algorithm based on the frequent complementary item storage structure of the transactional database. At last, this paper will apply this mining algorithm in the test results analysis module of team psychological health assessment system, and will extract the relationship between each psychological impact factor, so as to provide certain guidance for psychologists in their mental illness treatment.
Test-retest reliability of Eurofit Physical Fitness items for children with visual impairments

NARCIS (Netherlands)

Houwen, Suzanne; Visscher, Chris; Hartman, Esther; Lemmink, Koen A. P. M.

The purpose of this study was to examine the test-retest reliability of physical fitness items from the European Test of Physical Fitness (Eurofit) for children with visual impairments. A sample of 21 children, ages 6-12 years, that were recruited from a special school for children with visual
Three Modeling Applications to Promote Automatic Item Generation for Examinations in Dentistry.

Science.gov (United States)

Lai, Hollis; Gierl, Mark J; Byrne, B Ellen; Spielman, Andrew I; Waldschmidt, David M

2016-03-01

Test items created for dentistry examinations are often individually written by content experts. This approach to item development is expensive because it requires the time and effort of many content experts but yields relatively few items. The aim of this study was to describe and illustrate how items can be generated using a systematic approach. Automatic item generation (AIG) is an alternative method that allows a small number of content experts to produce large numbers of items by integrating their domain expertise with computer technology. This article describes and illustrates how three modeling approaches to item content-item cloning, cognitive modeling, and image-anchored modeling-can be used to generate large numbers of multiple-choice test items for examinations in dentistry. Test items can be generated by combining the expertise of two content specialists with technology supported by AIG. A total of 5,467 new items were created during this study. From substitution of item content, to modeling appropriate responses based upon a cognitive model of correct responses, to generating items linked to specific graphical findings, AIG has the potential for meeting increasing demands for test items. Further, the methods described in this study can be generalized and applied to many other item types. Future research applications for AIG in dental education are discussed.
What Does a Verbal Test Measure? A New Approach to Understanding Sources of Item Difficulty.

Science.gov (United States)

Berk, Eric J. Vanden; Lohman, David F.; Cassata, Jennifer Coyne

Assessing the construct relevance of mental test results continues to present many challenges, and it has proven to be particularly difficult to assess the construct relevance of verbal items. This study was conducted to gain a better understanding of the conceptual sources of verbal item difficulty using a unique approach that integrates…
Evaluation of the box and blocks test, stereognosis and item banks of activity and upper extremity function in youths with brachial plexus birth palsy.

Science.gov (United States)

Mulcahey, Mary Jane; Kozin, Scott; Merenda, Lisa; Gaughan, John; Tian, Feng; Gogola, Gloria; James, Michelle A; Ni, Pengsheng

2012-09-01

One of the greatest limitations to measuring outcomes in pediatric orthopaedics is the lack of effective instruments. Computer adaptive testing, which uses large item banks, select only items that are relevant to a child's function based on a previous response and filters items that are too easy or too hard or simply not relevant to the child. In this way, computer adaptive testing provides for a meaningful, efficient, and precise method to evaluate patient-reported outcomes. Banks of items that assess activity and upper extremity (UE) function have been developed for children with cerebral palsy and have enabled computer adaptive tests that showed strong reliability, strong validity, and broader content range when compared with traditional instruments. Because of the void in instruments for children with brachial plexus birth palsy (BPBP) and the importance of having an UE and activity scale, we were interested in how well these items worked in this population. Cross-sectional, multicenter study involving 200 children with BPBP was conducted. The box and block test (BBT) and Stereognosis tests were administered and patient reports of UE function and activity were obtained with the cerebral palsy item banks. Differential item functioning (DIF) was examined. Predictive ability of the BBT and stereognosis was evaluated with proportional odds logistic regression model. Spearman correlations coefficients (rs) were calculated to examine correlation between stereognosis and the BBT and between individual stereognosis items and the total stereognosis score. Six of the 86 items showed DIF, indicating that the activity and UE item banks may be useful for computer adaptive tests for children with BPBP. The penny and the button were strongest predictors of impairment level (odds ratio=0.34 to 0.40]. There was a good positive relationship between total stereognosis and BBT scores (rs=0.60). The BBT had a good negative (rs=-0.55) and good positive (rs=0.55) relationship with
NUNI (New User and New Item) Problem for SRSs Using Content Aware Multimedia-Based Approach

DEFF Research Database (Denmark)

Chaudhary, Pankaj; Deshmukh, Aaradhana A.; Mihovska, Albena Dimitrova

2015-01-01

Recommendation systems suggest items and users of interest based on preferences of items or users and item or user attributes. In social media-based services of dynamic content (such as news, blog, video, movies, books, etc.), recommender systems face the problem of discovering new items, new users...... the problem of identifying the new items and new users, to alleviate the dimensionality of the item-user rating matrix using biclustering technique. To overcome the information exiguity and rating diversity, it uses the smoothing and fusion technique. As discussed, the system presents content aware multimedia...
Bayes factor covariance testing in item response models

NARCIS (Netherlands)

Fox, J.P.; Mulder, J.; Sinharay, Sandip

2017-01-01

Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning
Bayes Factor Covariance Testing in Item Response Models

NARCIS (Netherlands)

Fox, Jean-Paul; Mulder, Joris; Sinharay, Sandip

2017-01-01

Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning
Item-Based Top-N Recommendation Algorithms

Science.gov (United States)

2003-01-20

basket of items, utilized by many e-commerce sites, cannot take advantage of pre-computed user-to-user similarities. Finally, even though the...not discriminate between items that are present in frequent itemsets and items that are not, while still maintaining the computational advantages of...453219 0.02% 7.74 ccard 42629 68793 398619 0.01% 9.35 ecommerce 6667 17491 91222 0.08% 13.68 em 8002 1648 769311 5.83% 96.14 ml 943 1682 100000 6.31
The Longer We Have to Forget the More We Remember: The Ironic Effect of Postcue Duration in Item-Based Directed Forgetting

Science.gov (United States)

Bancroft, Tyler D.; Hockley, William E.; Farquhar, Riley

2013-01-01

The effects of the duration of remember and forget cues were examined to test the differential rehearsal account of item-based directed forgetting. In Experiments 1 and 2, cues were shown for 300, 600, or 900 ms, and a directed forgetting effect (better recognition of remember than forget items) was found at each duration. In addition, recognition…
Item Anomaly Detection Based on Dynamic Partition for Time Series in Recommender Systems.

Science.gov (United States)

Gao, Min; Tian, Renli; Wen, Junhao; Xiong, Qingyu; Ling, Bin; Yang, Linda

2015-01-01

In recent years, recommender systems have become an effective method to process information overload. However, recommendation technology still suffers from many problems. One of the problems is shilling attacks-attackers inject spam user profiles to disturb the list of recommendation items. There are two characteristics of all types of shilling attacks: 1) Item abnormality: The rating of target items is always maximum or minimum; and 2) Attack promptness: It takes only a very short period time to inject attack profiles. Some papers have proposed item anomaly detection methods based on these two characteristics, but their detection rate, false alarm rate, and universality need to be further improved. To solve these problems, this paper proposes an item anomaly detection method based on dynamic partitioning for time series. This method first dynamically partitions item-rating time series based on important points. Then, we use chi square distribution (χ2) to detect abnormal intervals. The experimental results on MovieLens 100K and 1M indicate that this approach has a high detection rate and a low false alarm rate and is stable toward different attack models and filler sizes.
Item Anomaly Detection Based on Dynamic Partition for Time Series in Recommender Systems

Science.gov (United States)

Gao, Min; Tian, Renli; Wen, Junhao; Xiong, Qingyu; Ling, Bin; Yang, Linda

2015-01-01

In recent years, recommender systems have become an effective method to process information overload. However, recommendation technology still suffers from many problems. One of the problems is shilling attacks-attackers inject spam user profiles to disturb the list of recommendation items. There are two characteristics of all types of shilling attacks: 1) Item abnormality: The rating of target items is always maximum or minimum; and 2) Attack promptness: It takes only a very short period time to inject attack profiles. Some papers have proposed item anomaly detection methods based on these two characteristics, but their detection rate, false alarm rate, and universality need to be further improved. To solve these problems, this paper proposes an item anomaly detection method based on dynamic partitioning for time series. This method first dynamically partitions item-rating time series based on important points. Then, we use chi square distribution (χ2) to detect abnormal intervals. The experimental results on MovieLens 100K and 1M indicate that this approach has a high detection rate and a low false alarm rate and is stable toward different attack models and filler sizes. PMID:26267477
Investigating the Impact of Item Parameter Drift for Item Response Theory Models with Mixture Distributions

Directory of Open Access Journals (Sweden)

Yoon Soo ePark

2016-02-01

Full Text Available This study investigates the impact of item parameter drift (IPD on parameter and ability estimation when the underlying measurement model fits a mixture distribution, thereby violating the item invariance property of unidimensional item response theory (IRT models. An empirical study was conducted to demonstrate the occurrence of both IPD and an underlying mixture distribution using real-world data. Twenty-one trended anchor items from the 1999, 2003, and 2007 administrations of Trends in International Mathematics and Science Study (TIMSS were analyzed using unidimensional and mixture IRT models. TIMSS treats trended anchor items as invariant over testing administrations and uses pre-calibrated item parameters based on unidimensional IRT. However, empirical results showed evidence of two latent subgroups with IPD. Results showed changes in the distribution of examinee ability between latent classes over the three administrations. A simulation study was conducted to examine the impact of IPD on the estimation of ability and item parameters, when data have underlying mixture distributions. Simulations used data generated from a mixture IRT model and estimated using unidimensional IRT. Results showed that data reflecting IPD using mixture IRT model led to IPD in the unidimensional IRT model. Changes in the distribution of examinee ability also affected item parameters. Moreover, drift with respect to item discrimination and distribution of examinee ability affected estimates of examinee ability. These findings demonstrate the need to caution and evaluate IPD using a mixture IRT framework to understand its effect on item parameters and examinee ability.
Investigating the Impact of Item Parameter Drift for Item Response Theory Models with Mixture Distributions.

Science.gov (United States)

Park, Yoon Soo; Lee, Young-Sun; Xing, Kuan

2016-01-01

This study investigates the impact of item parameter drift (IPD) on parameter and ability estimation when the underlying measurement model fits a mixture distribution, thereby violating the item invariance property of unidimensional item response theory (IRT) models. An empirical study was conducted to demonstrate the occurrence of both IPD and an underlying mixture distribution using real-world data. Twenty-one trended anchor items from the 1999, 2003, and 2007 administrations of Trends in International Mathematics and Science Study (TIMSS) were analyzed using unidimensional and mixture IRT models. TIMSS treats trended anchor items as invariant over testing administrations and uses pre-calibrated item parameters based on unidimensional IRT. However, empirical results showed evidence of two latent subgroups with IPD. Results also showed changes in the distribution of examinee ability between latent classes over the three administrations. A simulation study was conducted to examine the impact of IPD on the estimation of ability and item parameters, when data have underlying mixture distributions. Simulations used data generated from a mixture IRT model and estimated using unidimensional IRT. Results showed that data reflecting IPD using mixture IRT model led to IPD in the unidimensional IRT model. Changes in the distribution of examinee ability also affected item parameters. Moreover, drift with respect to item discrimination and distribution of examinee ability affected estimates of examinee ability. These findings demonstrate the need to caution and evaluate IPD using a mixture IRT framework to understand its effects on item parameters and examinee ability.
Item-saving assessment of self-care performance in children with developmental disabilities: A prospective caregiver-report computerized adaptive test

Science.gov (United States)

Chen, Cheng-Te; Chen, Yu-Lan; Lin, Yu-Ching; Hsieh, Ching-Lin; Tzeng, Jeng-Yi

2018-01-01

Objective The purpose of this study was to construct a computerized adaptive test (CAT) for measuring self-care performance (the CAT-SC) in children with developmental disabilities (DD) aged from 6 months to 12 years in a content-inclusive, precise, and efficient fashion. Methods The study was divided into 3 phases: (1) item bank development, (2) item testing, and (3) a simulation study to determine the stopping rules for the administration of the CAT-SC. A total of 215 caregivers of children with DD were interviewed with the 73-item CAT-SC item bank. An item response theory model was adopted for examining the construct validity to estimate item parameters after investigation of the unidimensionality, equality of slope parameters, item fitness, and differential item functioning (DIF). In the last phase, the reliability and concurrent validity of the CAT-SC were evaluated. Results The final CAT-SC item bank contained 56 items. The stopping rules suggested were (a) reliability coefficient greater than 0.9 or (b) 14 items administered. The results of simulation also showed that 85% of the estimated self-care performance scores would reach a reliability higher than 0.9 with a mean test length of 8.5 items, and the mean reliability for the rest was 0.86. Administering the CAT-SC could reduce the number of items administered by 75% to 84%. In addition, self-care performances estimated by the CAT-SC and the full item bank were very similar to each other (Pearson r = 0.98). Conclusion The newly developed CAT-SC can efficiently measure self-care performance in children with DD whose performances are comparable to those of TD children aged from 6 months to 12 years as precisely as the whole item bank. The item bank of the CAT-SC has good reliability and a unidimensional self-care construct, and the CAT can estimate self-care performance with less than 25% of the items in the item bank. Therefore, the CAT-SC could be useful for measuring self-care performance in children with
Estimating reliability coefficients with heterogeneous item weightings using Stata: A factor based approach

NARCIS (Netherlands)

Boermans, M.A.; Kattenberg, M.A.C.

2011-01-01

We show how to estimate a Cronbach's alpha reliability coefficient in Stata after running a principal component or factor analysis. Alpha evaluates to what extent items measure the same underlying content when the items are combined into a scale or used for latent variable. Stata allows for testing
Evaluating the validity of the Work Role Functioning Questionnaire (Canadian French version) using classical test theory and item response theory.

Science.gov (United States)

Hong, Quan Nha; Coutu, Marie-France; Berbiche, Djamal

2017-01-01

The Work Role Functioning Questionnaire (WRFQ) was developed to assess workers' perceived ability to perform job demands and is used to monitor presenteeism. Still few studies on its validity can be found in the literature. The purpose of this study was to assess the items and factorial composition of the Canadian French version of the WRFQ (WRFQ-CF). Two measurement approaches were used to test the WRFQ-CF: Classical Test Theory (CTT) and non-parametric Item Response Theory (IRT). A total of 352 completed questionnaires were analyzed. A four-factor and three-factor model models were tested and shown respectively good fit with 14 items (Root Mean Square Error of Approximation (RMSEA) = 0.06, Standardized Root Mean Square Residual (SRMR) = 0.04, Bentler Comparative Fit Index (CFI) = 0.98) and with 17 items (RMSEA = 0.059, SRMR = 0.048, CFI = 0.98). Using IRT, 13 problematic items were identified, of which 9 were common with CTT. This study tested different models with fewer problematic items found in a three-factor model. Using a non-parametric IRT and CTT for item purification gave complementary results. IRT is still scarcely used and can be an interesting alternative method to enhance the quality of a measurement instrument. More studies are needed on the WRFQ-CF to refine its items and factorial composition.
Psychometric aspects of item mapping for criterion-referenced interpretation and bookmark standard setting.

Science.gov (United States)

Huynh, Huynh

2010-01-01

Locating an item on an achievement continuum (item mapping) is well-established in technical work for educational/psychological assessment. Applications of item mapping may be found in criterion-referenced (CR) testing (or scale anchoring, Beaton and Allen, 1992; Huynh, 1994, 1998a, 2000a, 2000b, 2006), computer-assisted testing, test form assembly, and in standard setting methods based on ordered test booklets. These methods include the bookmark standard setting originally used for the CTB/TerraNova tests (Lewis, Mitzel, Green, and Patz, 1999), the item descriptor process (Ferrara, Perie, and Johnson, 2002) and a similar process described by Wang (2003) for multiple-choice licensure and certification examinations. While item response theory (IRT) models such as the Rasch and two-parameter logistic (2PL) models traditionally place a binary item at its location, Huynh has argued in the cited papers that such mapping may not be appropriate in selecting items for CR interpretation and scale anchoring.
Fuzzy prototype classifier based on items and its application in recommender system

Directory of Open Access Journals (Sweden)

Mei Cai

2017-01-01

Full Text Available Currently, recommender systems (RS are incorporating implicit information from social circle of the Internet. The implicit social information in human mind is not easy to reflect in appropriate decision making techniques. This paper consists of 2 contributions. First, we develop an item-based prototype classifier (IPC in which a prototype represents a social circlers preferences as a pattern classification technique. We assume the social circle which distinguishes with others by the items their members like. The prototype structure of the classifier is defined by two2-dimensional matrices. We use information gain and OWA aggregator to construct a feature space. The item-based classifier assigns a new item to some prototypes with different prototypicalities. We reform a typical data setmIris data set in UCI Machine Learning Repository to verify our fuzzy prototype classifier. The second proposition of this paper is to give the application of IPC in recommender system to solve new item cold-start problems. We modify the dataset of MovieLens to perform experimental demonstrations of the proposed ideas.
Differential item functioning magnitude and impact measures from item response theory models.

Science.gov (United States)

Kleinman, Marjorie; Teresi, Jeanne A

2016-01-01

Measures of magnitude and impact of differential item functioning (DIF) at the item and scale level, respectively are presented and reviewed in this paper. Most measures are based on item response theory models. Magnitude refers to item level effect sizes, whereas impact refers to differences between groups at the scale score level. Reviewed are magnitude measures based on group differences in the expected item scores and impact measures based on differences in the expected scale scores. The similarities among these indices are demonstrated. Various software packages are described that provide magnitude and impact measures, and new software presented that computes all of the available statistics conveniently in one program with explanations of their relationships to one another.

Dynamic Testing of Analogical Reasoning in 5- to 6-Year-Olds: Multiple-Choice versus Constructed-Response Training Items

Science.gov (United States)

Stevenson, Claire E.; Heiser, Willem J.; Resing, Wilma C. M.

2016-01-01

Multiple-choice (MC) analogy items are often used in cognitive assessment. However, in dynamic testing, where the aim is to provide insight into potential for learning and the learning process, constructed-response (CR) items may be of benefit. This study investigated whether training with CR or MC items leads to differences in the strategy…
Computer-Based Legal Education at the University of Illinois: A Report of Two Years' Experience

Science.gov (United States)

Maggs, Peter B.; Morgan, Thomas D.

1975-01-01

Describes experimentation with the Plato IV computer-assisted method of teaching law at the University of Illinois College of Law: development and testing of programs for teaching Future Interests and Offer and Acceptance, and law-related work currently being done on Plato. Potential, limitations, and student enthusiasm are summarized. (JT)
Methodological issues regarding power of classical test theory (CTT and item response theory (IRT-based approaches for the comparison of patient-reported outcomes in two groups of patients - a simulation study

Directory of Open Access Journals (Sweden)

Boyer François

2010-03-01

Full Text Available Abstract Background Patients-Reported Outcomes (PRO are increasingly used in clinical and epidemiological research. Two main types of analytical strategies can be found for these data: classical test theory (CTT based on the observed scores and models coming from Item Response Theory (IRT. However, whether IRT or CTT would be the most appropriate method to analyse PRO data remains unknown. The statistical properties of CTT and IRT, regarding power and corresponding effect sizes, were compared. Methods Two-group cross-sectional studies were simulated for the comparison of PRO data using IRT or CTT-based analysis. For IRT, different scenarios were investigated according to whether items or person parameters were assumed to be known, to a certain extent for item parameters, from good to poor precision, or unknown and therefore had to be estimated. The powers obtained with IRT or CTT were compared and parameters having the strongest impact on them were identified. Results When person parameters were assumed to be unknown and items parameters to be either known or not, the power achieved using IRT or CTT were similar and always lower than the expected power using the well-known sample size formula for normally distributed endpoints. The number of items had a substantial impact on power for both methods. Conclusion Without any missing data, IRT and CTT seem to provide comparable power. The classical sample size formula for CTT seems to be adequate under some conditions but is not appropriate for IRT. In IRT, it seems important to take account of the number of items to obtain an accurate formula.
Teoria da Resposta ao Item Teoria de la respuesta al item Item response theory

Directory of Open Access Journals (Sweden)

Eutalia Aparecida Candido de Araujo

2009-12-01

Full Text Available A preocupação com medidas de traços psicológicos é antiga, sendo que muitos estudos e propostas de métodos foram desenvolvidos no sentido de alcançar este objetivo. Entre os trabalhos propostos, destaca-se a Teoria da Resposta ao Item (TRI que, a princípio, veio completar limitações da Teoria Clássica de Medidas, empregada em larga escala até hoje na medida de traços psicológicos. O ponto principal da TRI é que ela leva em consideração o item particularmente, sem relevar os escores totais; portanto, as conclusões não dependem apenas do teste ou questionário, mas de cada item que o compõe. Este artigo propõe-se a apresentar esta Teoria que revolucionou a teoria de medidas.La preocupación con las medidas de los rasgos psicológicos es antigua y muchos estudios y propuestas de métodos fueron desarrollados para lograr este objetivo. Entre estas propuestas de trabajo se incluye la Teoría de la Respuesta al Ítem (TRI que, en principio, vino a completar las limitaciones de la Teoría Clásica de los Tests, ampliamente utilizada hasta hoy en la medida de los rasgos psicológicos. El punto principal de la TRI es que se tiene en cuenta el punto concreto, sin relevar las puntuaciones totales; por lo tanto, los resultados no sólo dependen de la prueba o cuestionario, sino que de cada ítem que lo compone. En este artículo se propone presentar la Teoría que revolucionó la teoría de medidas.The concern with measures of psychological traits is old and many studies and proposals of methods were developed to achieve this goal. Among these proposed methods highlights the Item Response Theory (IRT that, in principle, came to complete limitations of the Classical Test Theory, which is widely used until nowadays in the measurement of psychological traits. The main point of IRT is that it takes into account the item in particular, not relieving the total scores; therefore, the findings do not only depend on the test or questionnaire
Comparison of Classical Test Theory and Item Response Theory in Individual Change Assessment

NARCIS (Netherlands)

Jabrayilov, Ruslan; Emons, Wilco H. M.; Sijtsma, Klaas

2016-01-01

Clinical psychologists are advised to assess clinical and statistical significance when assessing change in individual patients. Individual change assessment can be conducted using either the methodologies of classical test theory (CTT) or item response theory (IRT). Researchers have been optimistic
A more general model for testing measurement invariance and differential item functioning.

Science.gov (United States)

Bauer, Daniel J

2017-09-01

The evaluation of measurement invariance is an important step in establishing the validity and comparability of measurements across individuals. Most commonly, measurement invariance has been examined using 1 of 2 primary latent variable modeling approaches: the multiple groups model or the multiple-indicator multiple-cause (MIMIC) model. Both approaches offer opportunities to detect differential item functioning within multi-item scales, and thereby to test measurement invariance, but both approaches also have significant limitations. The multiple groups model allows 1 to examine the invariance of all model parameters but only across levels of a single categorical individual difference variable (e.g., ethnicity). In contrast, the MIMIC model permits both categorical and continuous individual difference variables (e.g., sex and age) but permits only a subset of the model parameters to vary as a function of these characteristics. The current article argues that moderated nonlinear factor analysis (MNLFA) constitutes an alternative, more flexible model for evaluating measurement invariance and differential item functioning. We show that the MNLFA subsumes and combines the strengths of the multiple group and MIMIC models, allowing for a full and simultaneous assessment of measurement invariance and differential item functioning across multiple categorical and/or continuous individual difference variables. The relationships between the MNLFA model and the multiple groups and MIMIC models are shown mathematically and via an empirical demonstration. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
A Review of Classical Methods of Item Analysis.

Science.gov (United States)

French, Christine L.

Item analysis is a very important consideration in the test development process. It is a statistical procedure to analyze test items that combines methods used to evaluate the important characteristics of test items, such as difficulty, discrimination, and distractibility of the items in a test. This paper reviews some of the classical methods for…
Automated Scoring of Short-Answer Open-Ended GRE® Subject Test Items. ETS GRE® Board Research Report No. 04-02. ETS RR-08-20

Science.gov (United States)

Attali, Yigal; Powers, Don; Freedman, Marshall; Harrison, Marissa; Obetz, Susan

2008-01-01

This report describes the development, administration, and scoring of open-ended variants of GRE® Subject Test items in biology and psychology. These questions were administered in a Web-based experiment to registered examinees of the respective Subject Tests. The questions required a short answer of 1-3 sentences, and responses were automatically…
A simple and fast item selection procedure for adaptive testing

NARCIS (Netherlands)

Veerkamp, W.J.J.; Veerkamp, Wim J.J.; Berger, Martijn; Berger, Martijn P.F.

1994-01-01

Items with the highest discrimination parameter values in a logistic item response theory (IRT) model do not necessarily give maximum information. This paper shows which discrimination parameter values (as a function of the guessing parameter and the distance between person ability and item
USING COMPUTER-BASED TESTING AS ALTERNATIVE ASSESSMENT METHOD OF STUDENT LEARNING IN DISTANCE EDUCATION

Directory of Open Access Journals (Sweden)

Amalia SAPRIATI

2010-04-01

Full Text Available This paper addresses the use of computer-based testing in distance education, based on the experience of Universitas Terbuka (UT, Indonesia. Computer-based testing has been developed at UT for reasons of meeting the specific needs of distance students as the following: Ø students’ inability to sit for the scheduled test, Ø conflicting test schedules, and Ø students’ flexibility to take examination to improve their grades. In 2004, UT initiated a pilot project in the development of system and program for computer-based testing method. Then in 2005 and 2006 tryouts in the use of computer-based testing methods were conducted in 7 Regional Offices that were considered as having sufficient supporting recourses. The results of the tryouts revealed that students were enthusiastic in taking computer-based tests and they expected that the test method would be provided by UT as alternative to the traditional paper and pencil test method. UT then implemented computer-based testing method in 6 and 12 Regional Offices in 2007 and 2008 respectively. The computer-based testing was administered in the city of the designated Regional Office and was supervised by the Regional Office staff. The development of the computer-based testing was initiated with conducting tests using computers in networked configuration. The system has been continually improved, and it currently uses devices linked to the internet or the World Wide Web. The construction of the test involves the generation and selection of the test items from the item bank collection of the UT Examination Center. Thus the combination of the selected items compromises the test specification. Currently UT has offered 250 courses involving the use of computer-based testing. Students expect that more courses are offered with computer-based testing in Regional Offices within easy access by students.
Validity and Reliability of the 8-Item Work Limitations Questionnaire.

Science.gov (United States)

Walker, Timothy J; Tullar, Jessica M; Diamond, Pamela M; Kohl, Harold W; Amick, Benjamin C

2017-12-01

Purpose To evaluate factorial validity, scale reliability, test-retest reliability, convergent validity, and discriminant validity of the 8-item Work Limitations Questionnaire (WLQ) among employees from a public university system. Methods A secondary analysis using de-identified data from employees who completed an annual Health Assessment between the years 2009-2015 tested research aims. Confirmatory factor analysis (CFA) (n = 10,165) tested the latent structure of the 8-item WLQ. Scale reliability was determined using a CFA-based approach while test-retest reliability was determined using the intraclass correlation coefficient. Convergent/discriminant validity was tested by evaluating relations between the 8-item WLQ with health/performance variables for convergent validity (health-related work performance, number of chronic conditions, and general health) and demographic variables for discriminant validity (gender and institution type). Results A 1-factor model with three correlated residuals demonstrated excellent model fit (CFI = 0.99, TLI = 0.99, RMSEA = 0.03, and SRMR = 0.01). The scale reliability was acceptable (0.69, 95% CI 0.68-0.70) and the test-retest reliability was very good (ICC = 0.78). Low-to-moderate associations were observed between the 8-item WLQ and the health/performance variables while weak associations were observed between the demographic variables. Conclusions The 8-item WLQ demonstrated sufficient reliability and validity among employees from a public university system. Results suggest the 8-item WLQ is a usable alternative for studies when the more comprehensive 25-item WLQ is not available.
[Plato's philosophy and the bioethical debate on the end of life: intersections in public health].

Science.gov (United States)

Siqueira-Batista, Rodrigo; Schramm, Fermin Roland

2004-01-01

This article discusses bioethical aspects of medical futility, focusing on some of its intersections in public health. Starting from a demarcation of finitude in the core of the philosophical and bioethical debate on the end of life, we confront the contemporary criticism regarding medical futility with the ideas of Plato (427-347 B.C.), a philosopher who proposed significant considerations on numerous features of the medicine of his time. We thus explore novel theoretic references to guide the disputes related to this essential problem, the implications of which are decisive to health and life.
Identifying predictors of physics item difficulty: A linear regression approach

Science.gov (United States)

Mesic, Vanes; Muratovic, Hasnija

2011-06-01

Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge
Identifying predictors of physics item difficulty: A linear regression approach

Directory of Open Access Journals (Sweden)

Hasnija Muratovic

2011-06-01

Full Text Available Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal
Reduced-Item Food Audits Based on the Nutrition Environment Measures Surveys.

Science.gov (United States)

Partington, Susan N; Menzies, Tim J; Colburn, Trina A; Saelens, Brian E; Glanz, Karen

2015-10-01

The community food environment may contribute to obesity by influencing food choice. Store and restaurant audits are increasingly common methods for assessing food environments, but are time consuming and costly. A valid, reliable brief measurement tool is needed. The purpose of this study was to develop and validate reduced-item food environment audit tools for stores and restaurants. Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed in 820 stores and 1,795 restaurants in West Virginia, San Diego, and Seattle. Data mining techniques (correlation-based feature selection and linear regression) were used to identify survey items highly correlated to total survey scores and produce reduced-item audit tools that were subsequently validated against full NEMS surveys. Regression coefficients were used as weights that were applied to reduced-item tool items to generate comparable scores to full NEMS surveys. Data were collected and analyzed in 2008-2013. The reduced-item tools included eight items for grocery, ten for convenience, seven for variety, and five for other stores; and 16 items for sit-down, 14 for fast casual, 19 for fast food, and 13 for specialty restaurants-10% of the full NEMS-S and 25% of the full NEMS-R. There were no significant differences in median scores for varying types of retail food outlets when compared to the full survey scores. Median in-store audit time was reduced 25%-50%. Reduced-item audit tools can reduce the burden and complexity of large-scale or repeated assessments of the retail food environment without compromising measurement quality. Copyright © 2015 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.
Social developmnet of ecologically sensitive rural areas: Case studies of the Moravian Karst (Czech Republic) and the Devetashko Plato (Bulgaria)

Czech Academy of Sciences Publication Activity Database

Zapletalová, Jana; Stefanová, D.; Vaishar, Antonín; Stefanov, P.; Dvořák, Petr; Tcherkezova, E.

3-4, 3-4 (2016), s. 65-84 ISSN 0204-7209 Institutional support: RVO:68145535 Keywords : social development * rural sensitive areas * Devetashko Plato * Bulgaria * Moravian karst - Czech Republic Subject RIV: DE - Earth Magnetism, Geodesy, Geography OBOR OECD: Cultural and economic geography http://geoproblems.eu/wp-content/uploads/2017/04/2016_34/4_zapletalova.pdf
Software Note: Using BILOG for Fixed-Anchor Item Calibration

Science.gov (United States)

DeMars, Christine E.; Jurich, Daniel P.

2012-01-01

The nonequivalent groups anchor test (NEAT) design is often used to scale item parameters from two different test forms. A subset of items, called the anchor items or common items, are administered as part of both test forms. These items are used to adjust the item calibrations for any differences in the ability distributions of the groups taking…
Negative effects of item repetition on source memory.

Science.gov (United States)

Kim, Kyungmi; Yi, Do-Joon; Raye, Carol L; Johnson, Marcia K

2012-08-01

In the present study, we explored how item repetition affects source memory for new item-feature associations (picture-location or picture-color). We presented line drawings varying numbers of times in Phase 1. In Phase 2, each drawing was presented once with a critical new feature. In Phase 3, we tested memory for the new source feature of each item from Phase 2. Experiments 1 and 2 demonstrated and replicated the negative effects of item repetition on incidental source memory. Prior item repetition also had a negative effect on source memory when different source dimensions were used in Phases 1 and 2 (Experiment 3) and when participants were explicitly instructed to learn source information in Phase 2 (Experiments 4 and 5). Importantly, when the order between Phases 1 and 2 was reversed, such that item repetition occurred after the encoding of critical item-source combinations, item repetition no longer affected source memory (Experiment 6). Overall, our findings did not support predictions based on item predifferentiation, within-dimension source interference, or general interference from multiple traces of an item. Rather, the findings were consistent with the idea that prior item repetition reduces attention to subsequent presentations of the item, decreasing the likelihood that critical item-source associations will be encoded.
A 67-Item Stress Resilience item bank showing high content validity was developed in a psychosomatic sample.

Science.gov (United States)

Obbarius, Nina; Fischer, Felix; Obbarius, Alexander; Nolte, Sandra; Liegl, Gregor; Rose, Matthias

2018-04-10

To develop the first item bank to measure Stress Resilience (SR) in clinical populations. Qualitative item development resulted in an initial pool of 131 items covering a broad theoretical SR concept. These items were tested in n=521 patients at a psychosomatic outpatient clinic. Exploratory and Confirmatory Factor Analysis (CFA), as well as other state-of-the-art item analyses and IRT were used for item evaluation and calibration of the final item bank. Out of the initial item pool of 131 items, we excluded 64 items (54 factor loading .3, 2 non-discriminative Item Response Curves, 4 Differential Item Functioning). The final set of 67 items indicated sufficient model fit in CFA and IRT analyses. Additionally, a 10-item short form with high measurement precision (SE≤.32 in a theta range between -1.8 and +1.5) was derived. Both the SR item bank and the SR short form were highly correlated with an existing static legacy tool (Connor-Davidson Resilience Scale). The final SR item bank and 10-item short form showed good psychometric properties. When further validated, they will be ready to be used within a framework of Computer-Adaptive Tests for a comprehensive assessment of the Stress-Construct. Copyright © 2018. Published by Elsevier Inc.
Item Banking with Embedded Standards

Science.gov (United States)

MacCann, Robert G.; Stanley, Gordon

2009-01-01

An item banking method that does not use Item Response Theory (IRT) is described. This method provides a comparable grading system across schools that would be suitable for low-stakes testing. It uses the Angoff standard-setting method to obtain item ratings that are stored with each item. An example of such a grading system is given, showing how…

Differential Item Functioning in the SF-36 Physical Functioning and Mental Health Sub-Scales: A Population-Based Investigation in the Canadian Multicentre Osteoporosis Study.

Science.gov (United States)

Lix, Lisa M; Wu, Xiuyun; Hopman, Wilma; Mayo, Nancy; Sajobi, Tolulope T; Liu, Juxin; Prior, Jerilynn C; Papaioannou, Alexandra; Josse, Robert G; Towheed, Tanveer E; Davison, K Shawn; Sawatzky, Richard

2016-01-01

Self-reported health status measures, like the Short Form 36-item Health Survey (SF-36), can provide rich information about the overall health of a population and its components, such as physical, mental, and social health. However, differential item functioning (DIF), which arises when population sub-groups with the same underlying (i.e., latent) level of health have different measured item response probabilities, may compromise the comparability of these measures. The purpose of this study was to test for DIF on the SF-36 physical functioning (PF) and mental health (MH) sub-scale items in a Canadian population-based sample. Study data were from the prospective Canadian Multicentre Osteoporosis Study (CaMos), which collected baseline data in 1996-1997. DIF was tested using a multiple indicators multiple causes (MIMIC) method. Confirmatory factor analysis defined the latent variable measurement model for the item responses and latent variable regression with demographic and health status covariates (i.e., sex, age group, body weight, self-perceived general health) produced estimates of the magnitude of DIF effects. The CaMos cohort consisted of 9423 respondents; 69.4% were female and 51.7% were less than 65 years. Eight of 10 items on the PF sub-scale and four of five items on the MH sub-scale exhibited DIF. Large DIF effects were observed on PF sub-scale items about vigorous and moderate activities, lifting and carrying groceries, walking one block, and bathing or dressing. On the MH sub-scale items, all DIF effects were small or moderate in size. SF-36 PF and MH sub-scale scores were not comparable across population sub-groups defined by demographic and health status variables due to the effects of DIF, although the magnitude of this bias was not large for most items. We recommend testing and adjusting for DIF to ensure comparability of the SF-36 in population-based investigations.
Differential Item Functioning in the SF-36 Physical Functioning and Mental Health Sub-Scales: A Population-Based Investigation in the Canadian Multicentre Osteoporosis Study.

Directory of Open Access Journals (Sweden)

Lisa M Lix

Full Text Available Self-reported health status measures, like the Short Form 36-item Health Survey (SF-36, can provide rich information about the overall health of a population and its components, such as physical, mental, and social health. However, differential item functioning (DIF, which arises when population sub-groups with the same underlying (i.e., latent level of health have different measured item response probabilities, may compromise the comparability of these measures. The purpose of this study was to test for DIF on the SF-36 physical functioning (PF and mental health (MH sub-scale items in a Canadian population-based sample.Study data were from the prospective Canadian Multicentre Osteoporosis Study (CaMos, which collected baseline data in 1996-1997. DIF was tested using a multiple indicators multiple causes (MIMIC method. Confirmatory factor analysis defined the latent variable measurement model for the item responses and latent variable regression with demographic and health status covariates (i.e., sex, age group, body weight, self-perceived general health produced estimates of the magnitude of DIF effects.The CaMos cohort consisted of 9423 respondents; 69.4% were female and 51.7% were less than 65 years. Eight of 10 items on the PF sub-scale and four of five items on the MH sub-scale exhibited DIF. Large DIF effects were observed on PF sub-scale items about vigorous and moderate activities, lifting and carrying groceries, walking one block, and bathing or dressing. On the MH sub-scale items, all DIF effects were small or moderate in size.SF-36 PF and MH sub-scale scores were not comparable across population sub-groups defined by demographic and health status variables due to the effects of DIF, although the magnitude of this bias was not large for most items. We recommend testing and adjusting for DIF to ensure comparability of the SF-36 in population-based investigations.
Adaptation and validation into Portuguese language of the six-item cognitive impairment test (6CIT).

Science.gov (United States)

Apóstolo, João Luís Alves; Paiva, Diana Dos Santos; Silva, Rosa Carla Gomes da; Santos, Eduardo José Ferreira Dos; Schultz, Timothy John

2017-07-25

The six-item cognitive impairment test (6CIT) is a brief cognitive screening tool that can be administered to older people in 2-3 min. To adapt the 6CIT for the European Portuguese and determine its psychometric properties based on a sample recruited from several contexts (nursing homes; universities for older people; day centres; primary health care units). The original 6CIT was translated into Portuguese and the draft Portuguese version (6CIT-P) was back-translated and piloted. The accuracy of the 6CIT-P was assessed by comparison with the Portuguese Mini-Mental State Examination (MMSE). A convenience sample of 550 older people from various geographical locations in the north and centre of the country was used. The test-retest reliability coefficient was high (r = 0.95). The 6CIT-P also showed good internal consistency (α = 0.88) and corrected item-total correlations ranged between 0.32 and 0.90. Total 6CIT-P and MMSE scores were strongly correlated. The proposed 6CIT-P threshold for cognitive impairment is ≥10 in the Portuguese population, which gives sensitivity of 82.78% and specificity of 84.84%. The accuracy of 6CIT-P, as measured by area under the ROC curve, was 0.91. The 6CIT-P has high reliability and validity and is accurate when used to screen for cognitive impairment.
Development of Two-Tier Diagnostic Test Pictorial-Based for Identifying High School Students Misconceptions on the Mole Concept

Science.gov (United States)

Siswaningsih, W.; Firman, H.; Zackiyah; Khoirunnisa, A.

2017-02-01

The aim of this study was to develop the two-tier pictorial-based diagnostic test for identifying student misconceptions on mole concept. The method of this study is used development and validation. The development of the test Obtained through four phases, development of any items, validation, determination key, and application test. Test was developed in the form of pictorial consisting of two tier, the first tier Consist of four possible answers and the second tier Consist of four possible reasons. Based on the results of content validity of 20 items using the CVR (Content Validity Ratio), a number of 18 items declared valid. Based on the results of the reliability test using SPSS, Obtained 17 items with Cronbach’s Alpha value of 0703, the which means that items have accepted. A total of 10 items was conducted to 35 students of senior high school students who have studied the mole concept on one of the high schools in Cimahi. Based on the results of the application test, student misconceptions were identified in each label concept in mole concept with the percentage of misconceptions on the label concept of mole (60.15%), Avogadro’s number (34.28%), relative atomic mass (62, 84%), relative molecule mass (77.08%), molar mass (68.53%), molar volume of gas (57.11%), molarity (71.32%), chemical equation (82.77%), limiting reactants (91.40%), and molecular formula (77.13%).
Applicability of Item Response Theory to the Korean Nurses' Licensing Examination

Directory of Open Access Journals (Sweden)

Geum-Hee Jeong

2005-06-01

Full Text Available To test the applicability of item response theory (IRT to the Korean Nurses' Licensing Examination (KNLE, item analysis was performed after testing the unidimensionality and goodness-of-fit. The results were compared with those based on classical test theory. The results of the 330-item KNLE administered to 12,024 examinees in January 2004 were analyzed. Unidimensionality was tested using DETECT and the goodness-of-fit was tested using WINSTEPS for the Rasch model and Bilog-MG for the two-parameter logistic model. Item analysis and ability estimation were done using WINSTEPS. Using DETECT, Dmax ranged from 0.1 to 0.23 for each subject. The mean square value of the infit and outfit values of all items using WINSTEPS ranged from 0.1 to 1.5, except for one item in pediatric nursing, which scored 1.53. Of the 330 items, 218 (42.7% were misfit using the two-parameter logistic model of Bilog-MG. The correlation coefficients between the difficulty parameter using the Rasch model and the difficulty index from classical test theory ranged from 0.9039 to 0.9699. The correlation between the ability parameter using the Rasch model and the total score from classical test theory ranged from 0.9776 to 0.9984. Therefore, the results of the KNLE fit unidimensionality and goodness-of-fit for the Rasch model. The KNLE should be a good sample for analysis according to the IRT Rasch model, so further research using IRT is possible.
The Body Appreciation Scale-2: item refinement and psychometric evaluation.

Science.gov (United States)

Tylka, Tracy L; Wood-Barcalow, Nichole L

2015-01-01

Considered a positive body image measure, the 13-item Body Appreciation Scale (BAS; Avalos, Tylka, & Wood-Barcalow, 2005) assesses individuals' acceptance of, favorable opinions toward, and respect for their bodies. While the BAS has accrued psychometric support, we improved it by rewording certain BAS items (to eliminate sex-specific versions and body dissatisfaction-based language) and developing additional items based on positive body image research. In three studies, we examined the reworded, newly developed, and retained items to determine their psychometric properties among college and online community (Amazon Mechanical Turk) samples of 820 women and 767 men. After exploratory factor analysis, we retained 10 items (five original BAS items). Confirmatory factor analysis upheld the BAS-2's unidimensionality and invariance across sex and sample type. Its internal consistency, test-retest reliability, and construct (convergent, incremental, and discriminant) validity were supported. The BAS-2 is a psychometrically sound positive body image measure applicable for research and clinical settings. Copyright © 2014 Elsevier Ltd. All rights reserved.
Modeling Item-Level and Step-Level Invariance Effects in Polytomous Items Using the Partial Credit Model

Science.gov (United States)

Gattamorta, Karina A.; Penfield, Randall D.; Myers, Nicholas D.

2012-01-01

Measurement invariance is a common consideration in the evaluation of the validity and fairness of test scores when the tested population contains distinct groups of examinees, such as examinees receiving different forms of a translated test. Measurement invariance in polytomous items has traditionally been evaluated at the item-level,…
Item and test analysis to identify quality multiple choice questions (MCQS from an assessment of medical students of Ahmedabad, Gujarat

Directory of Open Access Journals (Sweden)

Sanju Gajjar

2014-01-01

Full Text Available Background: Multiple choice questions (MCQs are frequently used to assess students in different educational streams for their objectivity and wide reach of coverage in less time. However, the MCQs to be used must be of quality which depends upon its difficulty index (DIF I, discrimination index (DI and distracter efficiency (DE. Objective: To evaluate MCQs or items and develop a pool of valid items by assessing with DIF I, DI and DE and also to revise/ store or discard items based on obtained results. Settings: Study was conducted in a medical school of Ahmedabad. Materials and Methods: An internal examination in Community Medicine was conducted after 40 hours teaching during 1 st MBBS which was attended by 148 out of 150 students. Total 50 MCQs or items and 150 distractors were analyzed. Statistical Analysis: Data was entered and analyzed in MS Excel 2007 and simple proportions, mean, standard deviations, coefficient of variation were calculated and unpaired t test was applied. Results: Out of 50 items, 24 had "good to excellent" DIF I (31 - 60% and 15 had "good to excellent" DI (> 0.25. Mean DE was 88.6% considered as ideal/ acceptable and non functional distractors (NFD were only 11.4%. Mean DI was 0.14. Poor DI (< 0.15 with negative DI in 10 items indicates poor preparedness of students and some issues with framing of at least some of the MCQs. Increased proportion of NFDs (incorrect alternatives selected by < 5% students in an item decrease DE and makes it easier. There were 15 items with 17 NFDs, while rest items did not have any NFD with mean DE of 100%. Conclusion: Study emphasizes the selection of quality MCQs which truly assess the knowledge and are able to differentiate the students of different abilities in correct manner.
Test-retest reliability at the item level and total score level of the Norwegian version of the Spinal Cord Injury Falls Concern Scale (SCI-FCS).

Science.gov (United States)

Roaldsen, Kirsti Skavberg; Måøy, Åsa Blad; Jørgensen, Vivien; Stanghelle, Johan Kvalvik

2016-05-01

Translation of the Spinal Cord Injury Falls Concern Scale (SCI-FCS), and investigation of test-retest reliability on item-level and total-score-level. Translation, adaptation and test-retest study. A specialized rehabilitation setting in Norway. Fifty-four wheelchair users with a spinal cord injury. The median age of the cohort was 49 years, and the median number of years after injury was 13. Interventions/measurements: The SCI-FCS was translated and back-translated according to guidelines. Individuals answered the SCI-FCS twice over the course of one week. We investigated item-level test-retest reliability using Svensson's rank-based statistical method for disagreement analysis of paired ordinal data. For relative reliability, we analyzed the total-score-level test-retest reliability with intraclass correlation coefficients (ICC2.1), the standard error of measurement (SEM), and the smallest detectable change (SDC) for absolute reliability/measurement-error assessment and Cronbach's alpha for internal consistency. All items showed satisfactory percentage agreement (≥69%) between test and retest. There were small but non-negligible systematic disagreements among three items; we recovered an 11-13% higher chance for a lower second score. There was no disagreement due to random variance. The test-retest agreement (ICC2.1) was excellent (0.83). The SEM was 2.6 (12%), and the SDC was 7.1 (32%). The Cronbach's alpha was high (0.88). The Norwegian SCI-FCS is highly reliable for wheelchair users with chronic spinal cord injuries.
Item analysis of the Spanish version of the Boston Naming Test with a Spanish speaking adult population from Colombia.

Science.gov (United States)

Kim, Stella H; Strutt, Adriana M; Olabarrieta-Landa, Laiene; Lequerica, Anthony H; Rivera, Diego; De Los Reyes Aragon, Carlos Jose; Utria, Oscar; Arango-Lasprilla, Juan Carlos

2018-02-23

The Boston Naming Test (BNT) is a widely used measure of confrontation naming ability that has been criticized for its questionable construct validity for non-English speakers. This study investigated item difficulty and construct validity of the Spanish version of the BNT to assess cultural and linguistic impact on performance. Subjects were 1298 healthy Spanish speaking adults from Colombia. They were administered the 60- and 15-item Spanish version of the BNT. A Rasch analysis was computed to assess dimensionality, item hierarchy, targeting, reliability, and item fit. Both versions of the BNT satisfied requirements for unidimensionality. Although internal consistency was excellent for the 60-item BNT, order of difficulty did not increase consistently with item number and there were a number of items that did not fit the Rasch model. For the 15-item BNT, a total of 5 items changed position on the item hierarchy with 7 poor fitting items. Internal consistency was acceptable. Construct validity of the BNT remains a concern when it is administered to non-English speaking populations. Similar to previous findings, the order of item presentation did not correspond with increasing item difficulty, and both versions were inadequate at assessing high naming ability.
The six-item Clock Drawing Test – reliability and validity in mild Alzheimer’s disease

DEFF Research Database (Denmark)

Jørgensen, Kasper; Kristensen, Maria K; Waldemar, Gunhild

2015-01-01

This study presents a reliable, short and practical version of the Clock Drawing Test (CDT) for clinical use and examines its diagnostic accuracy in mild Alzheimer's disease versus elderly nonpatients. Clock drawings from 231 participants were scored independently by four clinical neuropsychologi......This study presents a reliable, short and practical version of the Clock Drawing Test (CDT) for clinical use and examines its diagnostic accuracy in mild Alzheimer's disease versus elderly nonpatients. Clock drawings from 231 participants were scored independently by four clinical...... neuropsychologists blind to diagnostic classification. The interrater agreement of individual scoring criteria was analyzed and items with poor or moderate reliability were excluded. The classification accuracy of the resulting scoring system - the six-item CDT - was examined. We explored the effect of further...
Using classical test theory, item response theory, and Rasch measurement theory to evaluate patient-reported outcome measures: a comparison of worked examples.

Science.gov (United States)

Petrillo, Jennifer; Cano, Stefan J; McLeod, Lori D; Coon, Cheryl D

2015-01-01

To provide comparisons and a worked example of item- and scale-level evaluations based on three psychometric methods used in patient-reported outcome development-classical test theory (CTT), item response theory (IRT), and Rasch measurement theory (RMT)-in an analysis of the National Eye Institute Visual Functioning Questionnaire (VFQ-25). Baseline VFQ-25 data from 240 participants with diabetic macular edema from a randomized, double-masked, multicenter clinical trial were used to evaluate the VFQ at the total score level. CTT, RMT, and IRT evaluations were conducted, and results were assessed in a head-to-head comparison. Results were similar across the three methods, with IRT and RMT providing more detailed diagnostic information on how to improve the scale. CTT led to the identification of two problematic items that threaten the validity of the overall scale score, sets of redundant items, and skewed response categories. IRT and RMT additionally identified poor fit for one item, many locally dependent items, poor targeting, and disordering of over half the response categories. Selection of a psychometric approach depends on many factors. Researchers should justify their evaluation method and consider the intended audience. If the instrument is being developed for descriptive purposes and on a restricted budget, a cursory examination of the CTT-based psychometric properties may be all that is possible. In a high-stakes situation, such as the development of a patient-reported outcome instrument for consideration in pharmaceutical labeling, however, a thorough psychometric evaluation including IRT or RMT should be considered, with final item-level decisions made on the basis of both quantitative and qualitative results. Copyright © 2015. Published by Elsevier Inc.
Dynamic Testing of Analogical Reasoning in 5- to 6-Year-Olds : Multiple-Choice Versus Constructed-Response Training Items

NARCIS (Netherlands)

Stevenson, C.E.; Heiser, W.J.; Resing, W.C.M.

2016-01-01

Multiple-choice (MC) analogy items are often used in cognitive assessment. However, in dynamic testing, where the aim is to provide insight into potential for learning and the learning process, constructed-response (CR) items may be of benefit. This study investigated whether training with CR or MC
Using Item Response Theory to Develop a 60-Item Representation of the NEO PI-R Using the International Personality Item Pool: Development of the IPIP-NEO-60.

Science.gov (United States)

Maples-Keller, Jessica L; Williamson, Rachel L; Sleep, Chelsea E; Carter, Nathan T; Campbell, W Keith; Miller, Joshua D

2017-10-31

Given advantages of freely available and modifiable measures, an increase in the use of measures developed from the International Personality Item Pool (IPIP), including the 300-item representation of the Revised NEO Personality Inventory (NEO PI-R; Costa & McCrae, 1992a ) has occurred. The focus of this study was to use item response theory to develop a 60-item, IPIP-based measure of the Five-Factor Model (FFM) that provides equal representation of the FFM facets and to test the reliability and convergent and criterion validity of this measure compared to the NEO Five Factor Inventory (NEO-FFI). In an undergraduate sample (n = 359), scores from the NEO-FFI and IPIP-NEO-60 demonstrated good reliability and convergent validity with the NEO PI-R and IPIP-NEO-300. Additionally, across criterion variables in the undergraduate sample as well as a community-based sample (n = 757), the NEO-FFI and IPIP-NEO-60 demonstrated similar nomological networks across a wide range of external variables (r ICC = .96). Finally, as expected, in an MTurk sample the IPIP-NEO-60 demonstrated advantages over the Big Five Inventory-2 (Soto & John, 2017 ; n = 342) with regard to the Agreeableness domain content. The results suggest strong reliability and validity of the IPIP-NEO-60 scores.
Sequential Objective Structured Clinical Examination based on item response theory in Iran

Directory of Open Access Journals (Sweden)

Sara Mortaz Hejri

2017-09-01

Full Text Available Purpose In a sequential objective structured clinical examination (OSCE, all students initially take a short screening OSCE. Examinees who pass are excused from further testing, but an additional OSCE is administered to the remaining examinees. Previous investigations of sequential OSCE were based on classical test theory. We aimed to design and evaluate screening OSCEs based on item response theory (IRT. Methods We carried out a retrospective observational study. At each station of a 10-station OSCE, the students’ performance was graded on a Likert-type scale. Since the data were polytomous, the difficulty parameters, discrimination parameters, and students’ ability were calculated using a graded response model. To design several screening OSCEs, we identified the 5 most difficult stations and the 5 most discriminative ones. For each test, 5, 4, or 3 stations were selected. Normal and stringent cut-scores were defined for each test. We compared the results of each of the 12 screening OSCEs to the main OSCE and calculated the positive and negative predictive values (PPV and NPV, as well as the exam cost. Results A total of 253 students (95.1% passed the main OSCE, while 72.6% to 94.4% of examinees passed the screening tests. The PPV values ranged from 0.98 to 1.00, and the NPV values ranged from 0.18 to 0.59. Two tests effectively predicted the results of the main exam, resulting in financial savings of 34% to 40%. Conclusion If stations with the highest IRT-based discrimination values and stringent cut-scores are utilized in the screening test, sequential OSCE can be an efficient and convenient way to conduct an OSCE.
Sequential Objective Structured Clinical Examination based on item response theory in Iran.

Science.gov (United States)

Hejri, Sara Mortaz; Jalili, Mohammad

2017-01-01

In a sequential objective structured clinical examination (OSCE), all students initially take a short screening OSCE. Examinees who pass are excused from further testing, but an additional OSCE is administered to the remaining examinees. Previous investigations of sequential OSCE were based on classical test theory. We aimed to design and evaluate screening OSCEs based on item response theory (IRT). We carried out a retrospective observational study. At each station of a 10-station OSCE, the students' performance was graded on a Likert-type scale. Since the data were polytomous, the difficulty parameters, discrimination parameters, and students' ability were calculated using a graded response model. To design several screening OSCEs, we identified the 5 most difficult stations and the 5 most discriminative ones. For each test, 5, 4, or 3 stations were selected. Normal and stringent cut-scores were defined for each test. We compared the results of each of the 12 screening OSCEs to the main OSCE and calculated the positive and negative predictive values (PPV and NPV), as well as the exam cost. A total of 253 students (95.1%) passed the main OSCE, while 72.6% to 94.4% of examinees passed the screening tests. The PPV values ranged from 0.98 to 1.00, and the NPV values ranged from 0.18 to 0.59. Two tests effectively predicted the results of the main exam, resulting in financial savings of 34% to 40%. If stations with the highest IRT-based discrimination values and stringent cut-scores are utilized in the screening test, sequential OSCE can be an efficient and convenient way to conduct an OSCE.
Development of Abbreviated Nine-Item Forms of the Raven's Standard Progressive Matrices Test

Science.gov (United States)

Bilker, Warren B.; Hansen, John A.; Brensinger, Colleen M.; Richard, Jan; Gur, Raquel E.; Gur, Ruben C.

2012-01-01

The Raven's Standard Progressive Matrices (RSPM) is a 60-item test for measuring abstract reasoning, considered a nonverbal estimate of fluid intelligence, and often included in clinical assessment batteries and research on patients with cognitive deficits. The goal was to develop and apply a predictive model approach to reduce the number of items…
ITEM LEVEL DIAGNOSTICS AND MODEL - DATA FIT IN ITEM ...

African Journals Online (AJOL)

Global Journal

Item response theory (IRT) is a framework for modeling and analyzing item response ... data. Though, there is an argument that the evaluation of fit in IRT modeling has been ... National Council on Measurement in Education ... model data fit should be based on three types of ... prediction should be assessed through the.
Sensitivity and specificity of the 3-item memory test in the assessment of post traumatic amnesia.

Science.gov (United States)

Andriessen, Teuntje M J C; de Jong, Ben; Jacobs, Bram; van der Werf, Sieberen P; Vos, Pieter E

2009-04-01

To investigate how the type of stimulus (pictures or words) and the method of reproduction (free recall or recognition after a short or a long delay) affect the sensitivity and specificity of a 3-item memory test in the assessment of post traumatic amnesia (PTA). Daily testing was performed in 64 consecutively admitted traumatic brain injured patients, 22 orthopedically injured patients and 26 healthy controls until criteria for resolution of PTA were reached. Subjects were randomly assigned to a test with visual or verbal stimuli. Short delay reproduction was tested after an interval of 3-5 minutes, long delay reproduction was tested after 24 hours. Sensitivity and specificity were calculated over the first 4 test days. The 3-word test showed higher sensitivity than the 3-picture test, while specificity of the two tests was equally high. Free recall was a more effortful task than recognition for both patients and controls. In patients, a longer delay between registration and recall resulted in a significant decrease in the number of items reproduced. Presence of PTA is best assessed with a memory test that incorporates the free recall of words after a long delay.
Using Cochran's Z Statistic to Test the Kernel-Smoothed Item Response Function Differences between Focal and Reference Groups

Science.gov (United States)

Zheng, Yinggan; Gierl, Mark J.; Cui, Ying

2010-01-01

This study combined the kernel smoothing procedure and a nonparametric differential item functioning statistic--Cochran's Z--to statistically test the difference between the kernel-smoothed item response functions for reference and focal groups. Simulation studies were conducted to investigate the Type I error and power of the proposed…

Item Response Theory with Covariates (IRT-C): Assessing Item Recovery and Differential Item Functioning for the Three-Parameter Logistic Model

Science.gov (United States)

Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.

2016-01-01

In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…
Towards an authoring system for item construction

NARCIS (Netherlands)

Rikers, Jos H.A.N.

1988-01-01

The process of writing test items is analyzed, and a blueprint is presented for an authoring system for test item writing to reduce invalidity and to structure the process of item writing. The developmental methodology is introduced, and the first steps in the process are reported. A historical
Can Item Keyword Feedback Help Remediate Knowledge Gaps?

Science.gov (United States)

Feinberg, Richard A; Clauser, Amanda L

2016-10-01

In graduate medical education, assessment results can effectively guide professional development when both assessment and feedback support a formative model. When individuals cannot directly access the test questions and responses, a way of using assessment results formatively is to provide item keyword feedback. The purpose of the following study was to investigate whether exposure to item keyword feedback aids in learner remediation. Participants included 319 trainees who completed a medical subspecialty in-training examination (ITE) in 2012 as first-year fellows, and then 1 year later in 2013 as second-year fellows. Performance on 2013 ITE items in which keywords were, or were not, exposed as part of the 2012 ITE score feedback was compared across groups based on the amount of time studying (preparation). For the same items common to both 2012 and 2013 ITEs, response patterns were analyzed to investigate changes in answer selection. Test takers who indicated greater amounts of preparation on the 2013 ITE did not perform better on the items in which keywords were exposed compared to those who were not exposed. The response pattern analysis substantiated overall growth in performance from the 2012 ITE. For items with incorrect responses on both attempts, examinees selected the same option 58% of the time. Results from the current study were unsuccessful in supporting the use of item keywords in aiding remediation. Unfortunately, the results did provide evidence of examinees retaining misinformation.
Eugenics concept: from Plato to present.

Science.gov (United States)

Güvercin, C H; Arda, B

2008-01-01

All prospective studies and purposes to improve cure and create a race that would be exempt of various diseases and disabilities are generally defined as eugenic procedures. They aim to create the "perfect" and "higher" human being by eliminating the "unhealthy" prospective persons. All of the supporting actions taken in order to enable the desired properties are called positive eugenic actions; the elimination of undesired properties are defined as negative eugenics. In addition, if such applications and approaches target the public as a whole, they are defined as macro-eugenics. On the other hand, if they only aim at individuals and/or families, they are called micro-eugenics. As generally acknowledged, Galton re-introduced eugenic proposals, but their roots stretch as far back as Plato. Eugenic thoughts and developments were widely accepted in many different countries beginning with the end of the 19th to the first half of the 20th centuries. Initially, the view of negative eugenics that included compulsory sterilizations of handicapped, diseased and "lower" classes, resulted in tens of thousands being exterminated especially in the period of Nazi Germany. In the 1930s, the type of micro positive eugenics movement found a place within the pro-natalist policies of a number of countries. However, it was unsuccessful since the policy was not able to become effective enough and totally disappeared in the 1960s. It was no longer a fashionable movement and left a deep impression on public opinion after the long years of war. However, developments in genetics and its related fields have now enabled eugenic thoughts to reappear under the spotlight and this is creating new moral dilemmas from an ethical perspective.
Comparison of Exposure Controls, Item Pool Characteristics, and Population Distributions for CAT Using the Partial Credit Model

Science.gov (United States)

Lee, HwaYoung; Dodd, Barbara G.

2012-01-01

This study investigated item exposure control procedures under various combinations of item pool characteristics and ability distributions in computerized adaptive testing based on the partial credit model. Three variables were manipulated: item pool characteristics (120 items for each of easy, medium, and hard item pools), two ability…
Diagnostic accuracy of a two-item Drug Abuse Screening Test (DAST-2).

Science.gov (United States)

Tiet, Quyen Q; Leyva, Yani E; Moos, Rudolf H; Smith, Brandy

2017-11-01

Drug use is prevalent and costly to society, but individuals with drug use disorders (DUDs) are under-diagnosed and under-treated, particularly in primary care (PC) settings. Drug screening instruments have been developed to identify patients with DUDs and facilitate treatment. The Drug Abuse Screening Test (DAST) is one of the most well-known drug screening instruments. However, similar to many such instruments, it is too long for routine use in busy PC settings. This study developed and validated a briefer and more practical DAST for busy PC settings. We recruited 1300 PC patients in two Department of Veterans Affairs (VA) clinics. Participants responded to a structured diagnostic interview. We randomly selected half of the sample to develop and the other half to validate the new instrument. We employed signal detection techniques to select the best DAST items to identify DUDs (based on the MINI) and negative consequences of drug use (measured by the Inventory of Drug Use Consequences). Performance indicators were calculated. The two-item DAST (DAST-2) was 97% sensitive and 91% specific for DUDs in the development sample and 95% sensitive and 89% specific in the validation sample. It was highly sensitive and specific for DUD and negative consequences of drug use in subgroups of patients, including gender, age, race/ethnicity, marital status, educational level, and posttraumatic stress disorder status. The DAST-2 is an appropriate drug screening instrument for routine use in PC settings in the VA and may be applicable in broader range of PC clinics. Published by Elsevier Ltd.
A Method for the Comparison of Item Selection Rules in Computerized Adaptive Testing

Science.gov (United States)

Barrada, Juan Ramon; Olea, Julio; Ponsoda, Vicente; Abad, Francisco Jose

2010-01-01

In a typical study comparing the relative efficiency of two item selection rules in computerized adaptive testing, the common result is that they simultaneously differ in accuracy and security, making it difficult to reach a conclusion on which is the more appropriate rule. This study proposes a strategy to conduct a global comparison of two or…
Introduction to an open source internet-based testing program for medical student examinations.

Science.gov (United States)

Lee, Yoon-Hwan

2009-12-20

The author developed a freely available open source internet-based testing program for medical examination. PHP and Java script were used as the programming language and postgreSQL as the database management system on an Apache web server and Linux operating system. The system approach was that a super user inputs the items, each school administrator inputs the examinees' information, and examinees access the system. The examinee's score is displayed immediately after examination with item analysis. The set-up of the system beginning with installation is described. This may help medical professors to easily adopt an internet-based testing system for medical education.
The Effects of Item Format and Cognitive Domain on Students' Science Performance in TIMSS 2011

Science.gov (United States)

Liou, Pey-Yan; Bulut, Okan

2017-12-01

The purpose of this study was to examine eighth-grade students' science performance in terms of two test design components, item format, and cognitive domain. The portion of Taiwanese data came from the 2011 administration of the Trends in International Mathematics and Science Study (TIMSS), one of the major international large-scale assessments in science. The item difficulty analysis was initially applied to show the proportion of correct items. A regression-based cumulative link mixed modeling (CLMM) approach was further utilized to estimate the impact of item format, cognitive domain, and their interaction on the students' science scores. The results of the proportion-correct statistics showed that constructed-response items were more difficult than multiple-choice items, and that the reasoning cognitive domain items were more difficult compared to the items in the applying and knowing domains. In terms of the CLMM results, students tended to obtain higher scores when answering constructed-response items as well as items in the applying cognitive domain. When the two predictors and the interaction term were included together, the directions and magnitudes of the predictors on student science performance changed substantially. Plausible explanations for the complex nature of the effects of the two test-design predictors on student science performance are discussed. The results provide practical, empirical-based evidence for test developers, teachers, and stakeholders to be aware of the differential function of item format, cognitive domain, and their interaction in students' science performance.
NFC based Equipment Qualification Management (NEQM) system preventing counterfeit and fraudulent item

Energy Technology Data Exchange (ETDEWEB)

Chang, C.K., E-mail: ckchang@kings.ac.kr [KEPCO International Nuclear Graduate School, Ulsan (Korea, Republic of); Lee, K.J., E-mail: klee@khu.ac.kr [Kyung Hee Univ., Seoul (Korea, Republic of)

2014-07-01

Qualification of equipment essential to safety in nuclear power plants (NPPs) ensures its capability to perform designated safety functions on demand under postulated service conditions. However, a number of incidents identified by the NRC since 1980s catalysed the US nuclear industry to adopt standard precautions to guard against counterfeit items. The purpose of this paper is to suggest the NFC (Near Field Communication) based equipment qualification management system preventing counterfeit and fraudulent items. The NEQM (NFC based Equipment Qualification Management) system work with the support of legacy systems such as PMS (Procurement Management System) and FMS (Facility management System). (author)
NFC based Equipment Qualification Management (NEQM) system preventing counterfeit and fraudulent item

International Nuclear Information System (INIS)

Chang, C.K.; Lee, K.J.

2014-01-01

Qualification of equipment essential to safety in nuclear power plants (NPPs) ensures its capability to perform designated safety functions on demand under postulated service conditions. However, a number of incidents identified by the NRC since 1980s catalysed the US nuclear industry to adopt standard precautions to guard against counterfeit items. The purpose of this paper is to suggest the NFC (Near Field Communication) based equipment qualification management system preventing counterfeit and fraudulent items. The NEQM (NFC based Equipment Qualification Management) system work with the support of legacy systems such as PMS (Procurement Management System) and FMS (Facility management System). (author)
Using Set Covering with Item Sampling to Analyze the Infeasibility of Linear Programming Test Assembly Models

Science.gov (United States)

Huitzing, Hiddo A.

2004-01-01

This article shows how set covering with item sampling (SCIS) methods can be used in the analysis and preanalysis of linear programming models for test assembly (LPTA). LPTA models can construct tests, fulfilling a set of constraints set by the test assembler. Sometimes, no solution to the LPTA model exists. The model is then said to be…
Stepwise Analysis of Differential Item Functioning Based on Multiple-Group Partial Credit Model.

Science.gov (United States)

Muraki, Eiji

1999-01-01

Extended an Item Response Theory (IRT) method for detection of differential item functioning to the partial credit model and applied the method to simulated data using a stepwise procedure. Then applied the stepwise DIF analysis based on the multiple-group partial credit model to writing trend data from the National Assessment of Educational…
Memory for Items and Relationships among Items Embedded in Realistic Scenes: Disproportionate Relational Memory Impairments in Amnesia

Science.gov (United States)

Hannula, Deborah E.; Tranel, Daniel; Allen, John S.; Kirchhoff, Brenda A.; Nickel, Allison E.; Cohen, Neal J.

2014-01-01

Objective The objective of this study was to examine the dependence of item memory and relational memory on medial temporal lobe (MTL) structures. Patients with amnesia, who either had extensive MTL damage or damage that was relatively restricted to the hippocampus, were tested, as was a matched comparison group. Disproportionate relational memory impairments were predicted for both patient groups, and those with extensive MTL damage were also expected to have impaired item memory. Method Participants studied scenes, and were tested with interleaved two-alternative forced-choice probe trials. Probe trials were either presented immediately after the corresponding study trial (lag 1), five trials later (lag 5), or nine trials later (lag 9) and consisted of the studied scene along with a manipulated version of that scene in which one item was replaced with a different exemplar (item memory test) or was moved to a new location (relational memory test). Participants were to identify the exact match of the studied scene. Results As predicted, patients were disproportionately impaired on the test of relational memory. Item memory performance was marginally poorer among patients with extensive MTL damage, but both groups were impaired relative to matched comparison participants. Impaired performance was evident at all lags, including the shortest possible lag (lag 1). Conclusions The results are consistent with the proposed role of the hippocampus in relational memory binding and representation, even at short delays, and suggest that the hippocampus may also contribute to successful item memory when items are embedded in complex scenes. PMID:25068665
Improved utilization of ADAS-cog assessment data through item response theory based pharmacometric modeling.

Science.gov (United States)

Ueckert, Sebastian; Plan, Elodie L; Ito, Kaori; Karlsson, Mats O; Corrigan, Brian; Hooker, Andrew C

2014-08-01

This work investigates improved utilization of ADAS-cog data (the primary outcome in Alzheimer's disease (AD) trials of mild and moderate AD) by combining pharmacometric modeling and item response theory (IRT). A baseline IRT model characterizing the ADAS-cog was built based on data from 2,744 individuals. Pharmacometric methods were used to extend the baseline IRT model to describe longitudinal ADAS-cog scores from an 18-month clinical study with 322 patients. Sensitivity of the ADAS-cog items in different patient populations as well as the power to detect a drug effect in relation to total score based methods were assessed with the IRT based model. IRT analysis was able to describe both total and item level baseline ADAS-cog data. Longitudinal data were also well described. Differences in the information content of the item level components could be quantitatively characterized and ranked for mild cognitively impairment and mild AD populations. Based on clinical trial simulations with a theoretical drug effect, the IRT method demonstrated a significantly higher power to detect drug effect compared to the traditional method of analysis. A combined framework of IRT and pharmacometric modeling permits a more effective and precise analysis than total score based methods and therefore increases the value of ADAS-cog data.
The Effect of Mini and Midi Anchor Tests on Test Equating

Science.gov (United States)

Arikan, Çigdem Akin

2018-01-01

The main purpose of this study is to compare the test forms to the midi anchor test and the mini anchor test performance based on item response theory. The research was conducted with using simulated data which were generated based on Rasch model. In order to equate two test forms the anchor item nonequivalent groups (internal anchor test) was…
P2-19: The Effect of item Repetition on Item-Context Association Depends on the Prior Exposure of Items

Directory of Open Access Journals (Sweden)

Hongmi Lee

2012-10-01

Full Text Available Previous studies have reported conflicting findings on whether item repetition has beneficial or detrimental effects on source memory. To reconcile such contradictions, we investigated whether the degree of pre-exposure of items can be a potential modulating factor. The experimental procedures spanned two consecutive days. On Day 1, participants were exposed to a set of unfamiliar faces. On Day 2, the same faces presented on the previous day were used again in half of the participants, whereas novel faces were used for the other half. Day 2 procedures consisted of three successive phases: item repetition, source association, and source memory test. In the item repetition phase, half of the face stimuli were repeatedly presented while participants were making male/female judgments. During the source association phase, both the repeated and the unrepeated faces appeared in one of the four locations on the screen. Finally, participants were tested on the location in which a given face was presented during the previous phase and reported the confidence of their memory. Source memory accuracy was measured as the percentage of correct non-guess trials. As results, we found a significant interaction between prior exposure and repetition. Repetition impaired source memory when the items had been pre-exposed on Day 1, while it led to greater accuracy in novel ones. These results show that pre-experimental exposure can modulate the effects of repetition on associative binding between an item and its contextual information, suggesting that pre-existing representation and novelty signal interact to form new episodic memory.
Do people with and without medical conditions respond similarly to the short health anxiety inventory? An assessment of differential item functioning using item response theory.

Science.gov (United States)

LeBouthillier, Daniel M; Thibodeau, Michel A; Alberts, Nicole M; Hadjistavropoulos, Heather D; Asmundson, Gordon J G

2015-04-01

Individuals with medical conditions are likely to have elevated health anxiety; however, research has not demonstrated how medical status impacts response patterns on health anxiety measures. Measurement bias can undermine the validity of a questionnaire by overestimating or underestimating scores in groups of individuals. We investigated whether the Short Health Anxiety Inventory (SHAI), a widely-used measure of health anxiety, exhibits medical condition-based bias on item and subscale levels, and whether the SHAI subscales adequately assess the health anxiety continuum. Data were from 963 individuals with diabetes, breast cancer, or multiple sclerosis, and 372 healthy individuals. Mantel-Haenszel tests and item characteristic curves were used to classify the severity of item-level differential item functioning in all three medical groups compared to the healthy group. Test characteristic curves were used to assess scale-level differential item functioning and whether the SHAI subscales adequately assess the health anxiety continuum. Nine out of 14 items exhibited differential item functioning. Two items exhibited differential item functioning in all medical groups compared to the healthy group. In both Thought Intrusion and Fear of Illness subscales, differential item functioning was associated with mildly deflated scores in medical groups with very high levels of the latent traits. Fear of Illness items poorly discriminated between individuals with low and very low levels of the latent trait. While individuals with medical conditions may respond differentially to some items, clinicians and researchers can confidently use the SHAI with a variety of medical populations without concern of significant bias. Copyright © 2015 Elsevier Inc. All rights reserved.
Use of differential item functioning analysis to assess the equivalence of translations of a questionnaire

NARCIS (Netherlands)

Petersen, Morten Aa; Groenvold, Mogens; Bjorner, Jakob B.; Aaronson, Neil; Conroy, Thierry; Cull, Ann; Fayers, Peter; Hjermstad, Marianne; Sprangers, Mirjam; Sullivan, Marianne

2003-01-01

In cross-national comparisons based on questionnaires, accurate translations are necessary to obtain valid results. Differential item functioning (DIF) analysis can be used to test whether translations of items in multi-item scales are equivalent to the original. In data from 10,815 respondents
Propriedades psicométricas dos itens do teste WISC-III Propiedades psicométricas de los ítenes del subtest WISC-III Psychometric properties of WISC-III items

Directory of Open Access Journals (Sweden)

Vera Lúcia Marques de Figueiredo

2008-09-01

Full Text Available O aperfeiçoamento de um teste se dá através da seleção, substituição ou revisão de itens, e quando um item é analisado, aumenta a validade e precisão do teste. Este artigo trata da apresentação dos resultados relativos às propriedades psicométricas dos itens dos subtestes do WISC-III, referentes a dificuldade, discriminação e validade. O WISC-III é um instrumento amplamente utilizado no contexto da avaliação da inteligência, e conhecer a qualidade dos itens é essencial ao profissional que administra o teste. As análises foram efetuadas com base nas pontuações de 801 protocolos do teste, aplicados por ocasião da pesquisa de adaptação a um contexto brasileiro. As análises mostraram que os itens adaptados apresentaram características psicométricas adequadas, possibilitando a utilização do instrumento como meio confiável de diagnóstico.El perfeccionamiento de un teste ocurre por la selección, sustitución o revisión de ítenes y, cuando un item es analisado, aumenta la validez y fiabilidad del teste. Ese artículo trata de la presentación de los resultados relativos a las propiedades psicométricas de los ítenes del subtest WISC-III, referentes a la dificultad, a la discriminación y a la validez. El WISC-III es un instrumento muy utilizado en el contexto de la evaluación de la inteligencia, y conocer a la calidad de los ítenes es esencial al profesional que administra el teste. Los análisis fueron efectuados con base el los puntajes de 801 protocolos de registro del teste, aplicados por ocasión de encuesta de estandarización a un contexto brasileño. Los análisis enseñaron que los ítenes adaptados apuntaron características psicométricas adecuadas, permitiendo la utilización del instrumento como medio confiable de diagnóstico.The improvement of the quality of items by selection, substitution and review will increase a test's validity and reliability. Current essay will present results referring to

Automated Item Generation with Recurrent Neural Networks.

Science.gov (United States)

von Davier, Matthias

2018-03-12

Utilizing technology for automated item generation is not a new idea. However, test items used in commercial testing programs or in research are still predominantly written by humans, in most cases by content experts or professional item writers. Human experts are a limited resource and testing agencies incur high costs in the process of continuous renewal of item banks to sustain testing programs. Using algorithms instead holds the promise of providing unlimited resources for this crucial part of assessment development. The approach presented here deviates in several ways from previous attempts to solve this problem. In the past, automatic item generation relied either on generating clones of narrowly defined item types such as those found in language free intelligence tests (e.g., Raven's progressive matrices) or on an extensive analysis of task components and derivation of schemata to produce items with pre-specified variability that are hoped to have predictable levels of difficulty. It is somewhat unlikely that researchers utilizing these previous approaches would look at the proposed approach with favor; however, recent applications of machine learning show success in solving tasks that seemed impossible for machines not too long ago. The proposed approach uses deep learning to implement probabilistic language models, not unlike what Google brain and Amazon Alexa use for language processing and generation.
Item response theory - A first approach

Science.gov (United States)

Nunes, Sandra; Oliveira, Teresa; Oliveira, Amílcar

2017-07-01

The Item Response Theory (IRT) has become one of the most popular scoring frameworks for measurement data, frequently used in computerized adaptive testing, cognitively diagnostic assessment and test equating. According to Andrade et al. (2000), IRT can be defined as a set of mathematical models (Item Response Models - IRM) constructed to represent the probability of an individual giving the right answer to an item of a particular test. The number of Item Responsible Models available to measurement analysis has increased considerably in the last fifteen years due to increasing computer power and due to a demand for accuracy and more meaningful inferences grounded in complex data. The developments in modeling with Item Response Theory were related with developments in estimation theory, most remarkably Bayesian estimation with Markov chain Monte Carlo algorithms (Patz & Junker, 1999). The popularity of Item Response Theory has also implied numerous overviews in books and journals, and many connections between IRT and other statistical estimation procedures, such as factor analysis and structural equation modeling, have been made repeatedly (Van der Lindem & Hambleton, 1997). As stated before the Item Response Theory covers a variety of measurement models, ranging from basic one-dimensional models for dichotomously and polytomously scored items and their multidimensional analogues to models that incorporate information about cognitive sub-processes which influence the overall item response process. The aim of this work is to introduce the main concepts associated with one-dimensional models of Item Response Theory, to specify the logistic models with one, two and three parameters, to discuss some properties of these models and to present the main estimation procedures.
A Case Study on an Item Writing Process: Use of Test Specifications, Nature of Group Dynamics, and Individual Item Writers' Characteristics

Science.gov (United States)

Kim, Jiyoung; Chi, Youngshin; Huensch, Amanda; Jun, Heesung; Li, Hongli; Roullion, Vanessa

2010-01-01

This article discusses a case study on an item writing process that reflects on our practical experience in an item development project. The purpose of the article is to share our lessons from the experience aiming to demystify item writing process. The study investigated three issues that naturally emerged during the project: how item writers use…
Love and/in psychoanalysis: a commentary on Lacan's reading of Plato's Symposium in Seminar VIII: Transference.

Science.gov (United States)

Fink, Bruce

2015-02-01

What is love and what part does it play in psychoanalysis? Where are the analyst and the analysand situated in relation to the roles defined as those of the "lover" and the "beloved"? Jacques Lacan explores these and other questions in his soon-to-be-published Seminar VIII: Transference by providing an extensive commentary on Plato's most famous dialogue on love, the Symposium. This paper outlines some of the major points about love that grow out of Lacan's reading of the dialogue and examines their relevance to the analytic setting. Can the analyst be characterized as a sort of modern-day Socrates?
Adaptive screening for depression--recalibration of an item bank for the assessment of depression in persons with mental and somatic diseases and evaluation in a simulated computer-adaptive test environment.

Science.gov (United States)

Forkmann, Thomas; Kroehne, Ulf; Wirtz, Markus; Norra, Christine; Baumeister, Harald; Gauggel, Siegfried; Elhan, Atilla Halil; Tennant, Alan; Boecker, Maren

2013-11-01

This study conducted a simulation study for computer-adaptive testing based on the Aachen Depression Item Bank (ADIB), which was developed for the assessment of depression in persons with somatic diseases. Prior to computer-adaptive test simulation, the ADIB was newly calibrated. Recalibration was performed in a sample of 161 patients treated for a depressive syndrome, 103 patients from cardiology, and 103 patients from otorhinolaryngology (mean age 44.1, SD=14.0; 44.7% female) and was cross-validated in a sample of 117 patients undergoing rehabilitation for cardiac diseases (mean age 58.4, SD=10.5; 24.8% women). Unidimensionality of the itembank was checked and a Rasch analysis was performed that evaluated local dependency (LD), differential item functioning (DIF), item fit and reliability. CAT-simulation was conducted with the total sample and additional simulated data. Recalibration resulted in a strictly unidimensional item bank with 36 items, showing good Rasch model fit (item fit residualsLD. CAT simulation revealed that 13 items on average were necessary to estimate depression in the range of -2 and +2 logits when terminating at SE≤0.32 and 4 items if using SE≤0.50. Receiver Operating Characteristics analysis showed that θ estimates based on the CAT algorithm have good criterion validity with regard to depression diagnoses (Area Under the Curve≥.78 for all cut-off criteria). The recalibration of the ADIB succeeded and the simulation studies conducted suggest that it has good screening performance in the samples investigated and that it may reasonably add to the improvement of depression assessment. © 2013.
An Analysis of the Connectedness to Nature Scale Based on Item Response Theory.

Science.gov (United States)

Pasca, Laura; Aragonés, Juan I; Coello, María T

2017-01-01

The Connectedness to Nature Scale (CNS) is used as a measure of the subjective cognitive connection between individuals and nature. However, to date, it has not been analyzed at the item level to confirm its quality. In the present study, we conduct such an analysis based on Item Response Theory. We employed data from previous studies using the Spanish-language version of the CNS, analyzing a sample of 1008 participants. The results show that seven items presented appropriate indices of discrimination and difficulty, in addition to a good fit. The remaining six have inadequate discrimination indices and do not present a good fit. A second study with 321 participants shows that the seven-item scale has adequate levels of reliability and validity. Therefore, it would be appropriate to use a reduced version of the scale after eliminating the items that display inappropriate behavior, since they may interfere with research results on connectedness to nature.
An Effective Multimedia Item Shell Design for Individualized Education: The Crome Project

Directory of Open Access Journals (Sweden)

Irene Cheng

2008-01-01

Full Text Available There are several advantages to creating multimedia item types and applying computer-based adaptive testing in education. First is the capability to motivate learning by making the learners feel more engaged and in an interactive environment. Second is a better concept representation, which is not possible in conventional multiple-choice tests. Third is the advantage of individualized curriculum design, rather than a curriculum designed for an average student. Fourth is a good choice of the next question, associated with the appropriate difficulty level based on a student's response to the current question. However, many issues need to be addressed when achieving these goals, including: (a the large number of item types required to represent the current multiple-choice questions in multimedia formats, (b the criterion used to determine the difficulty level of a multimedia question item, and (c the methodology applied to the question selection process for individual students. In this paper, we propose a multimedia item shell design that not only reduces the number of item types required, but also computes difficulty level of an item automatically. The concept of question seed is introduced to make content creation more cost-effective. The proposed item shell framework facilitates efficient communication between user responses at the client, and the scoring agents integrated with a student ability assessor at the server. We also describe approaches for automatically estimating difficulty level of questions, and discuss preliminary evaluation of multimedia item types by students.
Psychometric evaluation of an item bank for computerized adaptive testing of the EORTC QLQ-C30 cognitive functioning dimension in cancer patients.

Science.gov (United States)

Dirven, Linda; Groenvold, Mogens; Taphoorn, Martin J B; Conroy, Thierry; Tomaszewski, Krzysztof A; Young, Teresa; Petersen, Morten Aa

2017-11-01

The European Organisation of Research and Treatment of Cancer (EORTC) Quality of Life Group is developing computerized adaptive testing (CAT) versions of all EORTC Quality of Life Questionnaire (QLQ-C30) scales with the aim to enhance measurement precision. Here we present the results on the field-testing and psychometric evaluation of the item bank for cognitive functioning (CF). In previous phases (I-III), 44 candidate items were developed measuring CF in cancer patients. In phase IV, these items were psychometrically evaluated in a large sample of international cancer patients. This evaluation included an assessment of dimensionality, fit to the item response theory (IRT) model, differential item functioning (DIF), and measurement properties. A total of 1030 cancer patients completed the 44 candidate items on CF. Of these, 34 items could be included in a unidimensional IRT model, showing an acceptable fit. Although several items showed DIF, these had a negligible impact on CF estimation. Measurement precision of the item bank was much higher than the two original QLQ-C30 CF items alone, across the whole continuum. Moreover, CAT measurement may on average reduce study sample sizes with about 35-40% compared to the original QLQ-C30 CF scale, without loss of power. A CF item bank for CAT measurement consisting of 34 items was established, applicable to various cancer patients across countries. This CAT measurement system will facilitate precise and efficient assessment of HRQOL of cancer patients, without loss of comparability of results.
Introduction to an Open Source Internet-Based Testing Program for Medical Student Examinations

Directory of Open Access Journals (Sweden)

Yoon-Hwan Lee

2009-12-01

Full Text Available The author developed a freely available open source internet-based testing program for medical examination. PHP and Java script were used as the programming language and postgreSQL as the database management system on an Apache web server and Linux operating system. The system approach was that a super user inputs the items, each school administrator inputs the examinees’ information, and examinees access the system. The examinee’s score is displayed immediately after examination with item analysis. The set-up of the system beginning with installation is described. This may help medical professors to easily adopt an internet-based testing system for medical education.
Security Considerations and Recommendations in Computer-Based Testing

Directory of Open Access Journals (Sweden)

Saleh M. Al-Saleem

2014-01-01

Full Text Available Many organizations and institutions around the globe are moving or planning to move their paper-and-pencil based testing to computer-based testing (CBT. However, this conversion will not be the best option for all kinds of exams and it will require significant resources. These resources may include the preparation of item banks, methods for test delivery, procedures for test administration, and last but not least test security. Security aspects may include but are not limited to the identification and authentication of examinee, the risks that are associated with cheating on the exam, and the procedures related to test delivery to the examinee. This paper will mainly investigate the security considerations associated with CBT and will provide some recommendations for the security of these kinds of tests. We will also propose a palm-based biometric authentication system incorporated with basic authentication system (username/password in order to check the identity and authenticity of the examinee.
Security considerations and recommendations in computer-based testing.

Science.gov (United States)

Al-Saleem, Saleh M; Ullah, Hanif

2014-01-01

Many organizations and institutions around the globe are moving or planning to move their paper-and-pencil based testing to computer-based testing (CBT). However, this conversion will not be the best option for all kinds of exams and it will require significant resources. These resources may include the preparation of item banks, methods for test delivery, procedures for test administration, and last but not least test security. Security aspects may include but are not limited to the identification and authentication of examinee, the risks that are associated with cheating on the exam, and the procedures related to test delivery to the examinee. This paper will mainly investigate the security considerations associated with CBT and will provide some recommendations for the security of these kinds of tests. We will also propose a palm-based biometric authentication system incorporated with basic authentication system (username/password) in order to check the identity and authenticity of the examinee.
A New Functional Health Literacy Scale for Japanese Young Adults Based on Item Response Theory.

Science.gov (United States)

Tsubakita, Takashi; Kawazoe, Nobuo; Kasano, Eri

2017-03-01

Health literacy predicts health outcomes. Despite concerns surrounding the health of Japanese young adults, to date there has been no objective assessment of health literacy in this population. This study aimed to develop a Functional Health Literacy Scale for Young Adults (funHLS-YA) based on item response theory. Each item in the scale requires participants to choose the most relevant term from 3 choices in relation to a target item, thus assessing objective rather than perceived health literacy. The 20-item scale was administered to 1816 university students and 1751 responded. Cronbach's α coefficient was .73. Difficulty and discrimination parameters of each item were estimated, resulting in the exclusion of 1 item. Some items showed different difficulty parameters for male and female participants, reflecting that some aspects of health literacy may differ by gender. The current 19-item version of funHLS-YA can reliably assess the objective health literacy of Japanese young adults.
The role of attention in item-item binding in visual working memory.

Science.gov (United States)

Peterson, Dwight J; Naveh-Benjamin, Moshe

2017-09-01

An important yet unresolved question regarding visual working memory (VWM) relates to whether or not binding processes within VWM require additional attentional resources compared with processing solely the individual components comprising these bindings. Previous findings indicate that binding of surface features (e.g., colored shapes) within VWM is not demanding of resources beyond what is required for single features. However, it is possible that other types of binding, such as the binding of complex, distinct items (e.g., faces and scenes), in VWM may require additional resources. In 3 experiments, we examined VWM item-item binding performance under no load, articulatory suppression, and backward counting using a modified change detection task. Binding performance declined to a greater extent than single-item performance under higher compared with lower levels of concurrent load. The findings from each of these experiments indicate that processing item-item bindings within VWM requires a greater amount of attentional resources compared with single items. These findings also highlight an important distinction between the role of attention in item-item binding within VWM and previous studies of long-term memory (LTM) where declines in single-item and binding test performance are similar under divided attention. The current findings provide novel evidence that the specific type of binding is an important determining factor regarding whether or not VWM binding processes require attention. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Applying automatic item generation to create cohesive physics testlets

Science.gov (United States)

Mindyarto, B. N.; Nugroho, S. E.; Linuwih, S.

2018-03-01

Computer-based testing has created the demand for large numbers of items. This paper discusses the production of cohesive physics testlets using an automatic item generation concepts and procedures. The testlets were composed by restructuring physics problems to reveal deeper understanding of the underlying physical concepts by inserting a qualitative question and its scientific reasoning question. A template-based testlet generator was used to generate the testlet variants. Using this methodology, 1248 testlet variants were effectively generated from 25 testlet templates. Some issues related to the effective application of the generated physics testlets in practical assessments were discussed.
Infância e educação em Platão Childhood and education in Plato

Directory of Open Access Journals (Sweden)

Walter Omar Kohan

2003-06-01

Full Text Available Este trabalho estuda, desde uma perspectiva filosófica, o conceito de infância em Platão, com ênfase nos seguintes diálogos: Alcibíades I, Górgias, A República e As Leis. Num primeiro momento, situamos a questão da infância no marco mais ampliado do projeto filosófico e político de Platão. A seguir, propomos quatro traços principais do conceito de infância em Platão: a como possibilidade (as crianças podem ser qualquer coisa no futuro; b como inferioridade (as crianças - como as mulheres, estrangeiros e escravos - são inferiores em relação ao homem adulto cidadão; c como superfluidade (a infância não é necessária à pólis; d como material da política (a utopia se constrói a partir da educação das crianças. Não há a pretensão de levar Platão a algum tribunal. Busca-se apenas delimitar um problema e uma forma específica de enfrentá-lo, com vistas a contribuir para a análise da produtividade dessa perspectiva na história da filosofia da infância e da educação ocidental, bem como nas atuais teorias e práticas educacionais. Ao mesmo tempo, de forma implícita, procura-se oferecer elementos para problematizar uma visão já consolidada entre os historiadores da infância - particularmente desde o já clássico História social da infância e da família de Philippe Ariès -, segundo a qual a infância seria uma invenção moderna e ela não teria sido "pensada" pelos antigos enquanto tal.This work investigates from a philosophical perspective the concept of childhood in Plato, with an emphasis on the following dialogues: Alcibiades I, Gorgias, The Republic, and The Laws. Initially, we situate the issue of childhood within the wider scenario of Plato's political and philosophical project. We then propose four main features of the concept of childhood in Plato: a as possibility (children can become anything in future; b as inferiority (children - like women, foreigners and slaves - are inferior to the male
Development of a simple 12-item theory-based instrument to assess the impact of continuing professional development on clinical behavioral intentions.

Directory of Open Access Journals (Sweden)

France Légaré

Full Text Available Decision-makers in organizations providing continuing professional development (CPD have identified the need for routine assessment of its impact on practice. We sought to develop a theory-based instrument for evaluating the impact of CPD activities on health professionals' clinical behavioral intentions.Our multipronged study had four phases. 1 We systematically reviewed the literature for instruments that used socio-cognitive theories to assess healthcare professionals' clinically-oriented behavioral intentions and/or behaviors; we extracted items relating to the theoretical constructs of an integrated model of healthcare professionals' behaviors and removed duplicates. 2 A committee of researchers and CPD decision-makers selected a pool of items relevant to CPD. 3 An international group of experts (n = 70 reached consensus on the most relevant items using electronic Delphi surveys. 4 We created a preliminary instrument with the items found most relevant and assessed its factorial validity, internal consistency and reliability (weighted kappa over a two-week period among 138 physicians attending a CPD activity. Out of 72 potentially relevant instruments, 47 were analyzed. Of the 1218 items extracted from these, 16% were discarded as improperly phrased and 70% discarded as duplicates. Mapping the remaining items onto the constructs of the integrated model of healthcare professionals' behaviors yielded a minimum of 18 and a maximum of 275 items per construct. The partnership committee retained 61 items covering all seven constructs. Two iterations of the Delphi process produced consensus on a provisional 40-item questionnaire. Exploratory factorial analysis following test-retest resulted in a 12-item questionnaire. Cronbach's coefficients for the constructs varied from 0.77 to 0.85.A 12-item theory-based instrument for assessing the impact of CPD activities on health professionals' clinical behavioral intentions showed adequate validity and
Memory-based attention capture when multiple items are maintained in visual working memory.

Science.gov (United States)

Hollingworth, Andrew; Beck, Valerie M

2016-07-01

Efficient visual search requires that attention is guided strategically to relevant objects, and most theories of visual search implement this function by means of a target template maintained in visual working memory (VWM). However, there is currently debate over the architecture of VWM-based attentional guidance. We contrasted a single-item-template hypothesis with a multiple-item-template hypothesis, which differ in their claims about structural limits on the interaction between VWM representations and perceptual selection. Recent evidence from van Moorselaar, Theeuwes, and Olivers (2014) indicated that memory-based capture during search, an index of VWM guidance, is not observed when memory set size is increased beyond a single item, suggesting that multiple items in VWM do not guide attention. In the present study, we maximized the overlap between multiple colors held in VWM and the colors of distractors in a search array. Reliable capture was observed when 2 colors were held in VWM and both colors were present as distractors, using both the original van Moorselaar et al. singleton-shape search task and a search task that required focal attention to array elements (gap location in outline square stimuli). In the latter task, memory-based capture was consistent with the simultaneous guidance of attention by multiple VWM representations. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
CUSUM-based person-fit statistics for adaptive testing

NARCIS (Netherlands)

van Krimpen-Stoop, Edith; Meijer, R.R.

2001-01-01

Item scores that do not fit an assumed item response theory model may cause the latent trait value to be inaccurately estimated. Several person-fit statistics for detecting nonfitting score patterns for paper-and-pencil tests have been proposed. In the context of computerized adaptive tests (CAT),
CUSUM-based person-fit statistics for adaptive testing

NARCIS (Netherlands)

van Krimpen-Stoop, Edith; Meijer, R.R.

1999-01-01

Item scores that do not fit an assumed item response theory model may cause the latent trait value to be estimated inaccurately. Several person-fit statistics for detecting nonfitting score patterns for paper-and-pencil tests have been proposed. In the context of computerized adaptive tests (CAT),
Re-evaluating a vision-related quality of life questionnaire with item response theory (IRT and differential item functioning (DIF analyses

Directory of Open Access Journals (Sweden)

Knol Dirk L

2011-09-01

Full Text Available Abstract Background For the Low Vision Quality Of Life questionnaire (LVQOL it is unknown whether the psychometric properties are satisfactory when an item response theory (IRT perspective is considered. This study evaluates some essential psychometric properties of the LVQOL questionnaire in an IRT model, and investigates differential item functioning (DIF. Methods Cross-sectional data were used from an observational study among visually-impaired patients (n = 296. Calibration was performed for every dimension of the LVQOL in the graded response model. Item goodness-of-fit was assessed with the S-X2-test. DIF was assessed on relevant background variables (i.e. age, gender, visual acuity, eye condition, rehabilitation type and administration type with likelihood-ratio tests for DIF. The magnitude of DIF was interpreted by assessing the largest difference in expected scores between subgroups. Measurement precision was assessed by presenting test information curves; reliability with the index of subject separation. Results All items of the LVQOL dimensions fitted the model. There was significant DIF on several items. For two items the maximum difference between expected scores exceeded one point, and DIF was found on multiple relevant background variables. Item 1 'Vision in general' from the "Adjustment" dimension and item 24 'Using tools' from the "Reading and fine work" dimension were removed. Test information was highest for the "Reading and fine work" dimension. Indices for subject separation ranged from 0.83 to 0.94. Conclusions The items of the LVQOL showed satisfactory item fit to the graded response model; however, two items were removed because of DIF. The adapted LVQOL with 21 items is DIF-free and therefore seems highly appropriate for use in heterogeneous populations of visually impaired patients.

Nursing Faculty Decision Making about Best Practices in Test Construction, Item Analysis, and Revision

Science.gov (United States)

Killingsworth, Erin Elizabeth

2013-01-01

With the widespread use of classroom exams in nursing education there is a great need for research on current practices in nursing education regarding this form of assessment. The purpose of this study was to explore how nursing faculty members make decisions about using best practices in classroom test construction, item analysis, and revision in…
Automated Scoring of Constructed-Response Science Items: Prospects and Obstacles

Science.gov (United States)

Liu, Ou Lydia; Brew, Chris; Blackmore, John; Gerard, Libby; Madhok, Jacquie; Linn, Marcia C.

2014-01-01

Content-based automated scoring has been applied in a variety of science domains. However, many prior applications involved simplified scoring rubrics without considering rubrics representing multiple levels of understanding. This study tested a concept-based scoring tool for content-based scoring, c-rater™, for four science items with rubrics…
26 CFR 301.6682-1 - False information with respect to withholding allowances based on itemized deductions.

Science.gov (United States)

2010-04-01

... 26 Internal Revenue 18 2010-04-01 2010-04-01 false False information with respect to withholding allowances based on itemized deductions. 301.6682-1 Section 301.6682-1 Internal Revenue INTERNAL REVENUE... Amounts § 301.6682-1 False information with respect to withholding allowances based on itemized deductions...
Developing a Numerical Ability Test for Students of Education in Jordan: An Application of Item Response Theory

Science.gov (United States)

Abed, Eman Rasmi; Al-Absi, Mohammad Mustafa; Abu shindi, Yousef Abdelqader

2016-01-01

The purpose of the present study is developing a test to measure the numerical ability for students of education. The sample of the study consisted of (504) students from 8 universities in Jordan. The final draft of the test contains 45 items distributed among 5 dimensions. The results revealed that acceptable psychometric properties of the test;…
Detection of abnormal item based on time intervals for recommender systems.

Science.gov (United States)

Gao, Min; Yuan, Quan; Ling, Bin; Xiong, Qingyu

2014-01-01

With the rapid development of e-business, personalized recommendation has become core competence for enterprises to gain profits and improve customer satisfaction. Although collaborative filtering is the most successful approach for building a recommender system, it suffers from "shilling" attacks. In recent years, the research on shilling attacks has been greatly improved. However, the approaches suffer from serious problem in attack model dependency and high computational cost. To solve the problem, an approach for the detection of abnormal item is proposed in this paper. In the paper, two common features of all attack models are analyzed at first. A revised bottom-up discretized approach is then proposed based on time intervals and the features for the detection. The distributions of ratings in different time intervals are compared to detect anomaly based on the calculation of chi square distribution (χ(2)). We evaluated our approach on four types of items which are defined according to the life cycles of these items. The experimental results show that the proposed approach achieves a high detection rate with low computational cost when the number of attack profiles is more than 15. It improves the efficiency in shilling attacks detection by narrowing down the suspicious users.
Detection of Abnormal Item Based on Time Intervals for Recommender Systems

Directory of Open Access Journals (Sweden)

Min Gao

2014-01-01

Full Text Available With the rapid development of e-business, personalized recommendation has become core competence for enterprises to gain profits and improve customer satisfaction. Although collaborative filtering is the most successful approach for building a recommender system, it suffers from “shilling” attacks. In recent years, the research on shilling attacks has been greatly improved. However, the approaches suffer from serious problem in attack model dependency and high computational cost. To solve the problem, an approach for the detection of abnormal item is proposed in this paper. In the paper, two common features of all attack models are analyzed at first. A revised bottom-up discretized approach is then proposed based on time intervals and the features for the detection. The distributions of ratings in different time intervals are compared to detect anomaly based on the calculation of chi square distribution (χ2. We evaluated our approach on four types of items which are defined according to the life cycles of these items. The experimental results show that the proposed approach achieves a high detection rate with low computational cost when the number of attack profiles is more than 15. It improves the efficiency in shilling attacks detection by narrowing down the suspicious users.
Medial temporal lobe contributions to cued retrieval of items and contexts.

Science.gov (United States)

Hannula, Deborah E; Libby, Laura A; Yonelinas, Andrew P; Ranganath, Charan

2013-10-01

Several models have proposed that different regions of the medial temporal lobes contribute to different aspects of episodic memory. For instance, according to one view, the perirhinal cortex represents specific items, parahippocampal cortex represents information regarding the context in which these items were encountered, and the hippocampus represents item-context bindings. Here, we used event-related functional magnetic resonance imaging (fMRI) to test a specific prediction of this model-namely, that successful retrieval of items from context cues will elicit perirhinal recruitment and that successful retrieval of contexts from item cues will elicit parahippocampal cortex recruitment. Retrieval of the bound representation in either case was expected to elicit hippocampal engagement. To test these predictions, we had participants study several item-context pairs (i.e., pictures of objects and scenes, respectively), and then had them attempt to recall items from associated context cues and contexts from associated item cues during a scanned retrieval session. Results based on both univariate and multivariate analyses confirmed a role for hippocampus in content-general relational memory retrieval, and a role for parahippocampal cortex in successful retrieval of contexts from item cues. However, we also found that activity differences in perirhinal cortex were correlated with successful cued recall for both items and contexts. These findings provide partial support for the above predictions and are discussed with respect to several models of medial temporal lobe function. Copyright © 2013 Elsevier Ltd. All rights reserved.
Medial Temporal Lobe Contributions to Cued Retrieval of Items and Contexts

Science.gov (United States)

Hannula, Deborah E.; Libby, Laura A.; Yonelinas, Andrew P.; Ranganath, Charan

2013-01-01

Several models have proposed that different regions of the medial temporal lobes contribute to different aspects of episodic memory. For instance, according to one view, the perirhinal cortex represents specific items, parahippocampal cortex represents information regarding the context in which these items were encountered, and the hippocampus represents item-context bindings. Here, we used event-related functional magnetic resonance imaging (fMRI) to test a specific prediction of this model – namely, that successful retrieval of items from context cues will elicit perirhinal recruitment and that successful retrieval of contexts from item cues will elicit parahippocampal cortex recruitment. Retrieval of the bound representation in either case was expected to elicit hippocampal engagement. To test these predictions, we had participants study several item-context pairs (i.e., pictures of objects and scenes, respectively), and then had them attempt to recall items from associated context cues and contexts from associated item cues during a scanned retrieval session. Results based on both univariate and multivariate analyses confirmed a role for hippocampus in content-general relational memory retrieval, and a role for parahippocampal cortex in successful retrieval of contexts from item cues. However, we also found that activity differences in perirhinal cortex were correlated with successful cued recall for both items and contexts. These findings provide partial support for the above predictions and are discussed with respect to several models of medial temporal lobe function. PMID:23466350
Three controversies over item disclosure in medical licensure examinations

Directory of Open Access Journals (Sweden)

Yoon Soo Park

2015-09-01

Full Text Available In response to views on public's right to know, there is growing attention to item disclosure – release of items, answer keys, and performance data to the public – in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations – 1 fairness and validity, 2 impact on passing levels, and 3 utility of item disclosure – by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers’ right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration.
Gender Differences in Figural Matrices: The Moderating Role of Item Design Features

Science.gov (United States)

Arendasy, Martin E.; Sommer, Markus

2012-01-01

There is a heated debate on whether observed gender differences in some figural matrices in adults can be attributed to gender differences in inductive reasoning/G[subscript f] or differential item functioning and/or test bias. Based on previous studies we hypothesized that three specific item design features moderate the effect size of the gender…
The Case against the Arts from Plato to Tolstoy and Its Implications for Why and How the Arts Should Be Taught in Schools

Science.gov (United States)

Tate, Nicholas

2016-01-01

From Plato onwards many of the great Western thinkers have explored the nature of the arts, their contribution to society and their role in education. This has often involved a discussion of the potentially negative impact of the arts. The recurring message has been that the arts can warp judgment, elevate emotion at the expense of reason,…
Concreteness effects in short-term memory: a test of the item-order hypothesis.

Science.gov (United States)

Roche, Jaclynn; Tolan, G Anne; Tehan, Gerald

2011-12-01

The following experiments explore word length and concreteness effects in short-term memory within an item-order processing framework. This framework asserts order memory is better for those items that are relatively easy to process at the item level. However, words that are difficult to process benefit at the item level for increased attention/resources being applied. The prediction of the model is that differential item and order processing can be detected in episodic tasks that differ in the degree to which item or order memory are required by the task. The item-order account has been applied to the word length effect such that there is a short word advantage in serial recall but a long word advantage in item recognition. The current experiment considered the possibility that concreteness effects might be explained within the same framework. In two experiments, word length (Experiment 1) and concreteness (Experiment 2) are examined using forward serial recall, backward serial recall, and item recognition. These results for word length replicate previous studies showing the dissociation in item and order tasks. The same was not true for the concreteness effect. In all three tasks concrete words were better remembered than abstract words. The concreteness effect cannot be explained in terms of an item-order trade off. PsycINFO Database Record (c) 2011 APA, all rights reserved.
Tailored Cloze: Improved with Classical Item Analysis Techniques.

Science.gov (United States)

Brown, James Dean

1988-01-01

The reliability and validity of a cloze procedure used as an English-as-a-second-language (ESL) test in China were improved by applying traditional item analysis and selection techniques. The 'best' test items were chosen on the basis of item facility and discrimination indices, and were administered as a 'tailored cloze.' 29 references listed.…
Validation of the Spanish versions of the long (26 items) and short (12 items) forms of the Self-Compassion Scale (SCS).

Science.gov (United States)

Garcia-Campayo, Javier; Navarro-Gil, Mayte; Andrés, Eva; Montero-Marin, Jesús; López-Artal, Lorena; Demarzo, Marcelo Marcos Piva

2014-01-10

Self-compassion is a key psychological construct for assessing clinical outcomes in mindfulness-based interventions. The aim of this study was to validate the Spanish versions of the long (26 item) and short (12 item) forms of the Self-Compassion Scale (SCS). The translated Spanish versions of both subscales were administered to two independent samples: Sample 1 was comprised of university students (n = 268) who were recruited to validate the long form, and Sample 2 was comprised of Aragon Health Service workers (n = 271) who were recruited to validate the short form. In addition to SCS, the Mindful Attention Awareness Scale (MAAS), the State-Trait Anxiety Inventory-Trait (STAI-T), the Beck Depression Inventory (BDI) and the Perceived Stress Questionnaire (PSQ) were administered. Construct validity, internal consistency, test-retest reliability and convergent validity were tested. The Confirmatory Factor Analysis (CFA) of the long and short forms of the SCS confirmed the original six-factor model in both scales, showing goodness of fit. Cronbach's α for the 26 item SCS was 0.87 (95% CI = 0.85-0.90) and ranged between 0.72 and 0.79 for the 6 subscales. Cronbach's α for the 12-item SCS was 0.85 (95% CI = 0.81-0.88) and ranged between 0.71 and 0.77 for the 6 subscales. The long (26-item) form of the SCS showed a test-retest coefficient of 0.92 (95% CI = 0.89-0.94). The Intraclass Correlation (ICC) for the 6 subscales ranged from 0.84 to 0.93. The short (12-item) form of the SCS showed a test-retest coefficient of 0.89 (95% CI: 0.87-0.93). The ICC for the 6 subscales ranged from 0.79 to 0.91. The long and short forms of the SCS exhibited a significant negative correlation with the BDI, the STAI and the PSQ, and a significant positive correlation with the MAAS. The correlation between the total score of the long and short SCS form was r = 0.92. The Spanish versions of the long (26-item) and short (12-item) forms of the SCS are valid and
The e-MSWS-12: improving the multiple sclerosis walking scale using item response theory.

Science.gov (United States)

Engelhard, Matthew M; Schmidt, Karen M; Engel, Casey E; Brenton, J Nicholas; Patek, Stephen D; Goldman, Myla D

2016-12-01

The Multiple Sclerosis Walking Scale (MSWS-12) is the predominant patient-reported measure of multiple sclerosis (MS) -elated walking ability, yet it had not been analyzed using item response theory (IRT), the emerging standard for patient-reported outcome (PRO) validation. This study aims to reduce MSWS-12 measurement error and facilitate computerized adaptive testing by creating an IRT model of the MSWS-12 and distributing it online. MSWS-12 responses from 284 subjects with MS were collected by mail and used to fit and compare several IRT models. Following model selection and assessment, subpopulations based on age and sex were tested for differential item functioning (DIF). Model comparison favored a one-dimensional graded response model (GRM). This model met fit criteria and explained 87 % of response variance. The performance of each MSWS-12 item was characterized using category response curves (CRCs) and item information. IRT-based MSWS-12 scores correlated with traditional MSWS-12 scores (r = 0.99) and timed 25-foot walk (T25FW) speed (r = -0.70). Item 2 showed DIF based on age (χ 2 = 19.02, df = 5, p Item 11 showed DIF based on sex (χ 2 = 13.76, df = 5, p = 0.02). MSWS-12 measurement error depends on walking ability, but could be lowered by improving or replacing items with low information or DIF. The e-MSWS-12 includes IRT-based scoring, error checking, and an estimated T25FW derived from MSWS-12 responses. It is available at https://ms-irt.shinyapps.io/e-MSWS-12 .
Ancient philosophical ideas of the soul (Plato-Aristotelian tradition and Stoicism as a source of Patristic Thought

Directory of Open Access Journals (Sweden)

Zaitsev Cornelius

2014-10-01

Full Text Available The article discusses the ancient idea of the soul that in the patristic era has been enriched by the perception of the methodology of ancient philosophy. Greek and Roman thinkers considered some properties of the soul, its immortality, revealed its “levels and strata” (Plato, Aristotle, expressed first guesses about the nature of sinful passions (the Stoics. But some aspects still remained unresolved so far. This is the issue of materiality or immateriality, of the soul, which "raised" in the Russian Empire in the 19th century (the dispute saints Theophan the Recluse and Ignatius Brianchaninov and remains relevant today.
Examination of the PROMIS upper extremity item bank.

Science.gov (United States)

Hung, Man; Voss, Maren W; Bounsanga, Jerry; Crum, Anthony B; Tyser, Andrew R

Clinical measurement. The psychometric properties of the PROMIS v1.2 UE item bank were tested on various samples prior to its release, but have not been fully evaluated among the orthopaedic population. This study assesses the performance of the UE item bank within the UE orthopaedic patient population. The UE item bank was administered to 1197 adult patients presenting to a tertiary orthopaedic clinic specializing in hand and UE conditions and was examined using traditional statistics and Rasch analysis. The UE item bank fits a unidimensional model (outfit MNSQ range from 0.64 to 1.70) and has adequate reliabilities (person = 0.84; item = 0.82) and local independence (item residual correlations range from -0.37 to 0.34). Only one item exhibits gender differential item functioning. Most items target low levels of function. The UE item bank is a useful clinical assessment tool. Additional items covering higher functions are needed to enhance validity. Supplemental testing is recommended for patients at higher levels of function until more high function UE items are developed. 2c. Copyright © 2016 Hanley & Belfus. Published by Elsevier Inc. All rights reserved.
De item-reeks van de cognitieve screening test vergeleken met die van de mini-mental state examination

NARCIS (Netherlands)

Schmand, B.; Deelman, B. G.; Hooijer, C.; Jonker, C.; Lindeboom, J.

1996-01-01

The items of the ¿mini-mental state examination' (MMSE) and a Dutch dementia screening instrument, the ¿cognitive screening test' (CST), as well as the ¿geriatric mental status schedule' (GMS) and the ¿Dutch adult reading test' (DART), were administered to 4051 elderly people aged 65 to 84 years.
A comparison of Rasch item-fit and Cronbach's alpha item reduction analysis for the development of a Quality of Life scale for children and adolescents.

Science.gov (United States)

Erhart, M; Hagquist, C; Auquier, P; Rajmil, L; Power, M; Ravens-Sieberer, U

2010-07-01

This study compares item reduction analysis based on classical test theory (maximizing Cronbach's alpha - approach A), with analysis based on the Rasch Partial Credit Model item-fit (approach B), as applied to children and adolescents' health-related quality of life (HRQoL) items. The reliability and structural, cross-cultural and known-group validity of the measures were examined. Within the European KIDSCREEN project, 3019 children and adolescents (8-18 years) from seven European countries answered 19 HRQoL items of the Physical Well-being dimension of a preliminary KIDSCREEN instrument. The Cronbach's alpha and corrected item total correlation (approach A) were compared with infit mean squares and the Q-index item-fit derived according to a partial credit model (approach B). Cross-cultural differential item functioning (DIF ordinal logistic regression approach), structural validity (confirmatory factor analysis and residual correlation) and relative validity (RV) for socio-demographic and health-related factors were calculated for approaches (A) and (B). Approach (A) led to the retention of 13 items, compared with 11 items with approach (B). The item overlap was 69% for (A) and 78% for (B). The correlation coefficient of the summated ratings was 0.93. The Cronbach's alpha was similar for both versions [0.86 (A); 0.85 (B)]. Both approaches selected some items that are not strictly unidimensional and items displaying DIF. RV ratios favoured (A) with regard to socio-demographic aspects. Approach (B) was superior in RV with regard to health-related aspects. Both types of item reduction analysis should be accompanied by additional analyses. Neither of the two approaches was universally superior with regard to cultural, structural and known-group validity. However, the results support the usability of the Rasch method for developing new HRQoL measures for children and adolescents.
Language-related differential item functioning between English and German PROMIS Depression items is negligible.

Science.gov (United States)

Fischer, H Felix; Wahl, Inka; Nolte, Sandra; Liegl, Gregor; Brähler, Elmar; Löwe, Bernd; Rose, Matthias

2017-12-01

To investigate differential item functioning (DIF) of PROMIS Depression items between US and German samples we compared data from the US PROMIS calibration sample (n = 780), a German general population survey (n = 2,500) and a German clinical sample (n = 621). DIF was assessed in an ordinal logistic regression framework, with 0.02 as criterion for R 2 -change and 0.096 for Raju's non-compensatory DIF. Item parameters were initially fixed to the PROMIS Depression metric; we used plausible values to account for uncertainty in depression estimates. Only four items showed DIF. Accounting for DIF led to negligible effects for the full item bank as well as a post hoc simulated computer-adaptive test (German general population sample was considerably lower compared to the US reference value of 50. Overall, we found little evidence for language DIF between US and German samples, which could be addressed by either replacing the DIF items by items not showing DIF or by scoring the short form in German samples with the corrected item parameters reported. Copyright © 2016 John Wiley & Sons, Ltd.

The leading hand in bimanual activities - A search for more valid handedness items.

Science.gov (United States)

Olsson, Bo; Kirchengast, Sylvia

2016-11-01

The aim of this pilot study is to test a new approach to handedness assessment based on the concept of the leading hand. A well-established graphomotor performance test of handedness (H-D-T) and a new test according on the concept of the leading hand were undertaken by 41 Viennese schoolchildren between 6 and 8 years of age. The new test is based on in vivo observations of bimanual activities. In detail the test battery consisted of 8 fine motor leading hand items. Participants had to open and close four small objects (one tube, three small bottles) in order to observe twisting movements and four small objects (2 matchboxes, 2 small brushes) in order to observe back-and-forth movements. It turned out that the leading hand does not correlate with the hand dominance in a graphomotor test to the degree that the handedness in unimanual items has been found to do and that right leading hand scores in bimanual items are encountered significantly less often than right hand scores in a graphomotor test. The findings of the present study suggest that tests of the leading hand in vivo may contribute to a higher validity of the assessment of handedness in examinations of the lateralization of higher cortical functions.
Applying Item Response Theory to the Development of a Screening Adaptation of the Goldman-Fristoe Test of Articulation-Second Edition

Science.gov (United States)

Brackenbury, Tim; Zickar, Michael J.; Munson, Benjamin; Storkel, Holly L.

2017-01-01

Purpose: Item response theory (IRT) is a psychometric approach to measurement that uses latent trait abilities (e.g., speech sound production skills) to model performance on individual items that vary by difficulty and discrimination. An IRT analysis was applied to preschoolers' productions of the words on the Goldman-Fristoe Test of…
A unified factor-analytic approach to the detection of item and test bias: Illustration with the effect of providing calculators to students with dyscalculia

Directory of Open Access Journals (Sweden)

Lee, M. K.

2016-01-01

Full Text Available An absence of measurement bias against distinct groups is a prerequisite for the use of a given psychological instrument in scientific research or high-stakes assessment. Factor analysis is the framework explicitly adopted for the identification of such bias when the instrument consists of a multi-test battery, whereas item response theory is employed when the focus narrows to a single test composed of discrete items. Item response theory can be treated as a mild nonlinearization of the standard factor model, and thus the essential unity of bias detection at the two levels merits greater recognition. Here we illustrate the benefits of a unified approach with a real-data example, which comes from a statewide test of mathematics achievement where examinees diagnosed with dyscalculia were accommodated with calculators. We found that items that can be solved by explicit arithmetical computation became easier for the accommodated examinees, but the quantitative magnitude of this differential item functioning (measurement bias was small.
Seismic functional qualification of active mechanical and electrical components based on shaking table testing

International Nuclear Information System (INIS)

Jurukovski, D.

1999-01-01

The seismic testing for qualification of one sample of the NPP Kozloduy Control Panel type YKTC was carried out under Research Contract no: 8008/Rl, entitled: 'Seismic Functional Qualification of Active Mechanical and Electrical Components Based on Shaking Table Testing'. The tested specimen was selected by the Kozloduy NPP staff, Section 'TIA-2' (Technical Instrumentation and Automatics), however the seismic input parameters were selected by the NPP Kozloduy staff, Section HTS and SC (Hydro-Technical Systems and Engineering Structures). The applied methodology was developed by the Institute of Earthquake Engineering and Engineering Seismology staff. This report presents all relevant items related to the selected specimen seismic testing for seismic qualification such as: description of the tested specimen, mounting conditions on the shaking table, selection of seismic input parameters and creation of seismic excitations, description of the testing equipment, explanation of the applied methodology, 'on line' and 'off line' monitoring of the tested specimen, functioning capabilities, discussion of the results and their presentation and finally conclusions and recommendations. In this partial project report, two items are presented. The first item presents a review of the existing and used regulations for performing of the seismic and vibratory withstand testing of electro-mechanical equipment. The selection is made based on MEA, IEEE, IEC and former Soviet Union regulations. The second item presents the abstracts of all the tests performed at the Institute of Earthquake Engineering and Engineering Seismology in Skopje. The selected regulations, the experience of the Institute that has been gathered for the last seventeen years and some theoretical and experimental research will be the basis for further investigations for development of a synthesised methodology for seismic qualification of differently categorized equipment for nuclear power plants
Identifying potential misfit items in cognitive process of learning engineering mathematics based on Rasch model

International Nuclear Information System (INIS)

Ataei, Sh; Mahmud, Z; Khalid, M N

2014-01-01

The students learning outcomes clarify what students should know and be able to demonstrate after completing their course. So, one of the issues on the process of teaching and learning is how to assess students' learning. This paper describes an application of the dichotomous Rasch measurement model in measuring the cognitive process of engineering students' learning of mathematics. This study provides insights into the perspective of 54 engineering students' cognitive ability in learning Calculus III based on Bloom's Taxonomy on 31 items. The results denote that some of the examination questions are either too difficult or too easy for the majority of the students. This analysis yields FIT statistics which are able to identify if there is data departure from the Rasch theoretical model. The study has identified some potential misfit items based on the measurement of ZSTD where the removal misfit item was accomplished based on the MNSQ outfit of above 1.3 or less than 0.7 logit. Therefore, it is recommended that these items be reviewed or revised to better match the range of students' ability in the respective course.
Development and validation of an item response theory-based Social Responsiveness Scale short form.

Science.gov (United States)

Sturm, Alexandra; Kuhfeld, Megan; Kasari, Connie; McCracken, James T

2017-09-01

Research and practice in autism spectrum disorder (ASD) rely on quantitative measures, such as the Social Responsiveness Scale (SRS), for characterization and diagnosis. Like many ASD diagnostic measures, SRS scores are influenced by factors unrelated to ASD core features. This study further interrogates the psychometric properties of the SRS using item response theory (IRT), and demonstrates a strategy to create a psychometrically sound short form by applying IRT results. Social Responsiveness Scale analyses were conducted on a large sample (N = 21,426) of youth from four ASD databases. Items were subjected to item factor analyses and evaluation of item bias by gender, age, expressive language level, behavior problems, and nonverbal IQ. Item selection based on item psychometric properties, DIF analyses, and substantive validity produced a reduced item SRS short form that was unidimensional in structure, highly reliable (α = .96), and free of gender, age, expressive language, behavior problems, and nonverbal IQ influence. The short form also showed strong relationships with established measures of autism symptom severity (ADOS, ADI-R, Vineland). Degree of association between all measures varied as a function of expressive language. Results identified specific SRS items that are more vulnerable to non-ASD-related traits. The resultant 16-item SRS short form may possess superior psychometric properties compared to the original scale and emerge as a more precise measure of ASD core symptom severity, facilitating research and practice. Future research using IRT is needed to further refine existing measures of autism symptomatology. © 2017 Association for Child and Adolescent Mental Health.
Improving Inpatient Surveys: Web-Based Computer Adaptive Testing Accessed via Mobile Phone QR Codes.

Science.gov (United States)

Chien, Tsair-Wei; Lin, Weir-Sen

2016-03-02

The National Health Service (NHS) 70-item inpatient questionnaire surveys inpatients on their perceptions of their hospitalization experience. However, it imposes more burden on the patient than other similar surveys. The literature shows that computerized adaptive testing (CAT) based on item response theory can help shorten the item length of a questionnaire without compromising its precision. Our aim was to investigate whether CAT can be (1) efficient with item reduction and (2) used with quick response (QR) codes scanned by mobile phones. After downloading the 2008 inpatient survey data from the Picker Institute Europe website and analyzing the difficulties of this 70-item questionnaire, we used an author-made Excel program using the Rasch partial credit model to simulate 1000 patients' true scores followed by a standard normal distribution. The CAT was compared to two other scenarios of answering all items (AAI) and the randomized selection method (RSM), as we investigated item length (efficiency) and measurement accuracy. The author-made Web-based CAT program for gathering patient feedback was effectively accessed from mobile phones by scanning the QR code. We found that the CAT can be more efficient for patients answering questions (ie, fewer items to respond to) than either AAI or RSM without compromising its measurement accuracy. A Web-based CAT inpatient survey accessed by scanning a QR code on a mobile phone was viable for gathering inpatient satisfaction responses. With advances in technology, patients can now be offered alternatives for providing feedback about hospitalization satisfaction. This Web-based CAT is a possible option in health care settings for reducing the number of survey items, as well as offering an innovative QR code access.
PLATO: a computer code for the analysis of fission product plateout in HTGRs

International Nuclear Information System (INIS)

Suzuki, Katsuo; Morimoto, Toshio.

1981-01-01

The computer code PLATO for estimating plateout activities on surfaces of primary cooling system of HTGRs has been developed, and in this report, analytical model and digital calculation method incorporated in the code are described. The code utilizes the mass transfer model analogous to heat transfer coupled with an expression for adsorption-desorption phenomenon, and is able to analyze plateout behaviours in a closed circuit, like a reactor cooling system, which is constructed from a various kind of components, as well as in an open-ended tube. With the code, fission product concentration in the coolant and plateout amount on the surfaces are calculated along the coolant stream, and total removal rate by the plateout process is also obtained. Comparison of the analytical results with the experimental results, including checks of the effects of some calculation conditions on the results, and preliminary analysis on the VHTR plant have been made. (author)
Safeguard Application Options for the Laser-Based Item Monitoring System (LBIMS)

International Nuclear Information System (INIS)

Laughter, Mark D

2008-01-01

Researchers at Oak Ridge National Laboratory (ORNL) are developing a Laser-Based Item Monitoring System (LBIMS) for advanced safeguards at nuclear facilities. LBIMS uses a low-power laser transceiver to monitor the presence and position of items with retroreflective tags. The primary advantages of LBIMS are its scalability to continuously monitor a wide range of items, its ability to operate unattended, its low cost of implementation, and its inherent information security due to its line-of-sight and non-broadcasting operation. The primary proposed safeguard application of LBIMS is described in its name: item monitoring. LBIMS could be implemented in a storage area to continuously monitor containers of nuclear material and the area in which they are stored. The system could be configured to provide off-site notification if any of the containers are moved or removed or if the area is accessed. Individual tags would be used to monitor storage containers, and additional tags could be used to record information regarding secondary storage units and room access. The capability to register small changes in tag position opens up the possibility of several other uses. These include continuously monitoring piping arrangements for design information verification or recording equipment positions for other safeguards systems, such as tracking the opening and closing of autoclaves as part of a cylinder tracking system or opening and closing valves on a sample or product take-off line. Combined with attribute tags, which transmit information from any kind of sensor by modulating the laser signal, LBIMS provides the capability to wirelessly and securely collect safeguards data, even in areas where radio-frequency or other wireless communication methods are not practicable. Four application types are described in this report: static item monitoring, in-process item monitoring with trigger tags, multi-layered integration with trigger tags, and line-of-sight data transfer with
Development of an item bank for the EORTC Role Functioning Computer Adaptive Test (EORTC RF-CAT)

DEFF Research Database (Denmark)

Gamper, Eva-Maria; Petersen, Morten Aa.; Aaronson, Neil

2016-01-01

a computer-adaptive test (CAT) for RF. This was part of a larger project whose objective is to develop a CAT version of the EORTC QLQ-C30 which is one of the most widely used HRQOL instruments in oncology. METHODS: In accordance with EORTC guidelines, the development of the RF-CAT comprised four phases...... with good psychometric properties. The resulting item bank exhibits excellent reliability (mean reliability = 0.85, median = 0.95). Using the RF-CAT may allow sample size savings from 11 % up to 50 % compared to using the QLQ-C30 RF scale. CONCLUSIONS: The RF-CAT item bank improves the precision...
Psychometric evaluation of an item bank for computerized adaptive testing of the EORTC QLQ-C30 cognitive functioning dimension in cancer patients

DEFF Research Database (Denmark)

Dirven, Linda; Groenvold, Mogens; Taphoorn, Martin J. B.

2017-01-01

on the field-testing and psychometric evaluation of the item bank for cognitive functioning (CF). METHODS: In previous phases (I-III), 44 candidate items were developed measuring CF in cancer patients. In phase IV, these items were psychometrically evaluated in a large sample of international cancer patients...... model, showing an acceptable fit. Although several items showed DIF, these had a negligible impact on CF estimation. Measurement precision of the item bank was much higher than the two original QLQ-C30 CF items alone, across the whole continuum. Moreover, CAT measurement may on average reduce study...... sample sizes with about 35-40% compared to the original QLQ-C30 CF scale, without loss of power. CONCLUSION: A CF item bank for CAT measurement consisting of 34 items was established, applicable to various cancer patients across countries. This CAT measurement system will facilitate precise and efficient...
Threats to Validity When Using Open-Ended Items in International Achievement Studies: Coding Responses to the PISA 2012 Problem-Solving Test in Finland

Science.gov (United States)

Arffman, Inga

2016-01-01

Open-ended (OE) items are widely used to gather data on student performance in international achievement studies. However, several factors may threaten validity when using such items. This study examined Finnish coders' opinions about threats to validity when coding responses to OE items in the PISA 2012 problem-solving test. A total of 6…
Re-Fitting for a Different Purpose: A Case Study of Item Writer Practices in Adapting Source Texts for a Test of Academic Reading

Science.gov (United States)

Green, Anthony; Hawkey, Roger

2012-01-01

The important yet under-researched role of item writers in the selection and adaptation of texts for high-stakes reading tests is investigated through a case study involving a group of trained item writers working on the International English Language Testing System (IELTS). In the first phase of the study, participants were invited to reflect in…
Accuracy of a Classical Test Theory-Based Procedure for Estimating the Reliability of a Multistage Test. Research Report. ETS RR-17-02

Science.gov (United States)

Kim, Sooyeon; Livingston, Samuel A.

2017-01-01

The purpose of this simulation study was to assess the accuracy of a classical test theory (CTT)-based procedure for estimating the alternate-forms reliability of scores on a multistage test (MST) having 3 stages. We generated item difficulty and discrimination parameters for 10 parallel, nonoverlapping forms of the complete 3-stage test and…
Automatic item generation implemented for measuring artistic judgment aptitude.

Science.gov (United States)

Bezruczko, Nikolaus

2014-01-01

Automatic item generation (AIG) is a broad class of methods that are being developed to address psychometric issues arising from internet and computer-based testing. In general, issues emphasize efficiency, validity, and diagnostic usefulness of large scale mental testing. Rapid prominence of AIG methods and their implicit perspective on mental testing is bringing painful scrutiny to many sacred psychometric assumptions. This report reviews basic AIG ideas, then presents conceptual foundations, image model development, and operational application to artistic judgment aptitude testing.
The Quest for Item Types Based on Information Processing: An Analysis of Raven's Advanced Progressive Matrices, with a Consideration of Gender Differences

Science.gov (United States)

Vigneau, Francois; Bors, Douglas A.

2008-01-01

Various taxonomies of Raven's Advanced Progressive Matrices (APM) items have been proposed in the literature to account for performance on the test. In the present article, three such taxonomies based on information processing, namely Carpenter, Just and Shell's [Carpenter, P.A., Just, M.A., & Shell, P., (1990). What one intelligence test…
INVESTIGATION OF MIS ITEM 011589A AND 3013 CONTAINERS HAVING SIMILAR CHARACTERISTICS

Energy Technology Data Exchange (ETDEWEB)

Friday, G

2006-08-23

Recent testing has identified the presence of hydrogen and oxygen in MIS Item 011589A. This isolated observation has effectuated concern regarding the potential for flammable gas mixtures in containers in the storage inventory. This study examines the known physicochemical characteristics of MIS Item 011589A and queries the ISP Database for items that are most similar or potentially similar. Items identified as most similar are believed to have the highest probability of being chemically and structurally identical to MIS Item 011589A. Items identified as potentially like MIS Item 011589A have some attributes in common, have the potential to generate gases, but have a lower probability of having similar gas generating characteristics. MIS Item 011589A is an oxide that was generated prior to 1990 at Rocky Flats in Building 707. It was associated with foundry processing and had an actinide assay of approximately 77%. Prompt gamma analysis of MIS Item 011589A indicated the presence of chloride, fluorine, magnesium, sodium, and aluminum. Queries based on MIS representation classification and process of origin were applied to the ISP Database. Evaluation criteria included binning classification (i.e., innocuous, pressure, or pressure and corrosion), availability of prompt gamma analyses, presence of chlorine and magnesium, percentage of chlorine by weight, peak ratios (i.e., Na:Cl and Mg:Na), moisture, and percent assay. These queries identified 15 items that were most similar and 106 items that were potentially like MIS Item 011589A. Although these queries identified containers that could potentially generate flammable gases, verification and confirmation can only be accomplished by destructive evaluation and testing of containers from the storage inventory.
easyCBM CCSS Math Item Scaling and Test Form Revision (2012-2013): Grades 6-8. Technical Report #1313

Science.gov (United States)

Anderson, Daniel; Alonzo, Julie; Tindal, Gerald

2012-01-01

The purpose of this technical report is to document the piloting and scaling of new easyCBM mathematics test items aligned with the Common Core State Standards (CCSS) and to describe the process used to revise and supplement the 2012 research version easyCBM CCSS math tests in Grades 6-8. For all operational 2012 research version test forms (10…
Measuring stigma after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Stigma item bank and short form.

Science.gov (United States)

Kisala, Pamela A; Tulsky, David S; Pace, Natalie; Victorson, David; Choi, Seung W; Heinemann, Allen W

2015-05-01

To develop a calibrated item bank and computer adaptive test (CAT) to assess the effects of stigma on health-related quality of life in individuals with spinal cord injury (SCI). Grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, and item response theory (IRT)-based psychometric analyses. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. SCI-QOL Stigma Item Bank A sample of 611 individuals with traumatic SCI completed 30 items assessing SCI-related stigma. After 7 items were iteratively removed, factor analyses confirmed a unidimensional pool of items. Graded Response Model IRT analyses were used to estimate slopes and thresholds for the final 23 items. The SCI-QOL Stigma item bank is unique not only in the assessment of SCI-related stigma but also in the inclusion of individuals with SCI in all phases of its development. Use of confirmatory factor analytic and IRT methods provide flexibility and precision of measurement. The item bank may be administered as a CAT or as a 10-item fixed-length short form and can be used for research and clinical applications.
Investigating the Construct Measured by Banked Gap-Fill Items: Evidence from Eye-Tracking

Science.gov (United States)

McCray, Gareth; Brunfaut, Tineke

2018-01-01

This study investigates test-takers' processing while completing banked gap-fill tasks, designed to test reading proficiency, in order to test theoretically based expectations about the variation in cognitive processes of test-takers across levels of performance. Twenty-eight test-takers' eye traces on 24 banked gap-fill items (on six tasks) were…

A Balance Sheet for Educational Item Banking.

Science.gov (United States)

Hiscox, Michael D.

Educational item banking presents observers with a considerable paradox. The development of test items from scratch is viewed as wasteful, a luxury in times of declining resources. On the other hand, item banking has failed to become a mature technology despite large amounts of money and the efforts of talented professionals. The question of which…
NFC based Inspection and Qualification Management (NIQM) System Preventing Counterfeit and Fraudulent Item

International Nuclear Information System (INIS)

Chang, Choong Koo; Kim, Young Joo

2013-01-01

Design, manufacturing, fabrication, transportation and installation of the devices and equipment for nuclear power plants shall be conducted under the thorough quality assurance program for the nuclear safety. However, from late in the 1980s, NRC began to issue a number of communications alerting licenses to issues involving counterfeit and fraudulent items. A number of incidents identified by the NRC in the 1980s and 1990s catalyzed the US nuclear industry to adopt standard precautions to guard against counterfeit items. The purpose of this paper is to develop the NFC (Near Field Communication) based Inspection and Qualification Management(NIQM) system preventing counterfeit and fraudulent items. NFC is one of the latest wireless communication technologies. As a short-range wireless connectivity technology, NFC offers safe-yet simple and intuitive-communication between electronic devices. As described above, NFC technology can be applied to the inspection and qualification management system very effectively to prevent counterfeit and fraudulent items. In addition, NIQM system can use existing data and information through the interface with legacy system
NFC based Inspection and Qualification Management (NIQM) System Preventing Counterfeit and Fraudulent Item

Energy Technology Data Exchange (ETDEWEB)

Chang, Choong Koo; Kim, Young Joo [KEPCO International Nuclear Graduate School, Ulsan (Korea, Republic of)

2013-10-15

Design, manufacturing, fabrication, transportation and installation of the devices and equipment for nuclear power plants shall be conducted under the thorough quality assurance program for the nuclear safety. However, from late in the 1980s, NRC began to issue a number of communications alerting licenses to issues involving counterfeit and fraudulent items. A number of incidents identified by the NRC in the 1980s and 1990s catalyzed the US nuclear industry to adopt standard precautions to guard against counterfeit items. The purpose of this paper is to develop the NFC (Near Field Communication) based Inspection and Qualification Management(NIQM) system preventing counterfeit and fraudulent items. NFC is one of the latest wireless communication technologies. As a short-range wireless connectivity technology, NFC offers safe-yet simple and intuitive-communication between electronic devices. As described above, NFC technology can be applied to the inspection and qualification management system very effectively to prevent counterfeit and fraudulent items. In addition, NIQM system can use existing data and information through the interface with legacy system.
FIM-Minimum Data Set Motor Item Bank: Short Forms Development and Precision Comparison in Veterans.

Science.gov (United States)

Li, Chih-Ying; Romero, Sergio; Simpson, Annie N; Bonilha, Heather S; Simpson, Kit N; Hong, Ickpyo; Velozo, Craig A

2018-03-01

To improve the practical use of the short forms (SFs) developed from the item bank, we compared the measurement precision of the 4- and 8-item SFs generated from a motor item bank composed of the FIM and the Minimum Data Set (MDS). The FIM-MDS motor item bank allowed scores generated from different instruments to be co-calibrated. The 4- and 8-item SFs were developed based on Rasch analysis procedures. This article compared person strata, ceiling/floor effects, and test SE plots for each administration form and examined 95% confidence interval error bands of anchored person measures with the corresponding SFs. We used 0.3 SE as a criterion to reflect a reliability level of .90. Veterans' inpatient rehabilitation facilities and community living centers. Veterans (N=2500) who had both FIM and the MDS data within 6 days during 2008 through 2010. Not applicable. Four- and 8-item SFs of FIM, MDS, and FIM-MDS motor item bank. Six SFs were generated with 4 and 8 items across a range of difficulty levels from the FIM-MDS motor item bank. The three 8-item SFs all had higher correlations with the item bank (r=.82-.95), higher person strata, and less test error than the corresponding 4-item SFs (r=.80-.90). The three 4-item SFs did not meet the criteria of SE bank composed of existing instruments across the continuum of care in veterans. We also found that the number of items, not test specificity, determines the precision of the instrument. Copyright © 2017 American Congress of Rehabilitation Medicine. All rights reserved.
Modeling differential item functioning with group-specific item parameters: A computerized adaptive testing application

NARCIS (Netherlands)

Makransky, Guido; Glas, Cornelis A.W.

2013-01-01

Many important decisions are made based on the results of tests administered under different conditions in the fields of educational and psychological testing. Inaccurate inferences are often made if the property of measurement invariance (MI) is not assessed across these conditions. The importance
Sensitivity and specificity of the 3-item memory test in the assessment of post traumatic amnesia.

NARCIS (Netherlands)

Andriessen, T.M.J.C.; Jong, B. de; Jacobs, B.; Werf, S.P. van der; Vos, P.E.

2009-01-01

PRIMARY OBJECTIVE: To investigate how the type of stimulus (pictures or words) and the method of reproduction (free recall or recognition after a short or a long delay) affect the sensitivity and specificity of a 3-item memory test in the assessment of post traumatic amnesia (PTA). METHODS: Daily
The Dif Identification in Constructed Response Items Using Partial Credit Model

Directory of Open Access Journals (Sweden)

Heri Retnawati

2017-10-01

Full Text Available The study was to identify the load, the type and the significance of differential item functioning (DIF in constructed response item using the partial credit model (PCM. The data in the study were the students’ instruments and the students’ responses toward the PISA-like test items that had been completed by 386 ninth grade students and 460 tenth grade students who had been about 15 years old in the Province of Yogyakarta Special Region in Indonesia. The analysis toward the item characteristics through the student categorization based on their class was conducted toward the PCM using CONQUEST software. Furthermore, by applying these items characteristics, the researcher draw the category response function (CRF graphic in order to identify whether the type of DIF content had been in uniform or non-uniform. The significance of DIF was identified by comparing the discrepancy between the difficulty level parameter and the error in the CONQUEST output results. The results of the analysis showed that from 18 items that had been analyzed there were 4 items which had not been identified load DIF, there were 5 items that had been identified containing DIF but not statistically significant and there were 9 items that had been identified containing DIF significantly. The causes of items containing DIF were discussed.
Projective Item Response Model for Test-Independent Measurement

Science.gov (United States)

Ip, Edward Hak-Sing; Chen, Shyh-Huei

2012-01-01

The problem of fitting unidimensional item-response models to potentially multidimensional data has been extensively studied. The focus of this article is on response data that contains a major dimension of interest but that may also contain minor nuisance dimensions. Because fitting a unidimensional model to multidimensional data results in…
Lord-Wingersky Algorithm Version 2.0 for Hierarchical Item Factor Models with Applications in Test Scoring, Scale Alignment, and Model Fit Testing.

Science.gov (United States)

Cai, Li

2015-06-01

Lord and Wingersky's (Appl Psychol Meas 8:453-461, 1984) recursive algorithm for creating summed score based likelihoods and posteriors has a proven track record in unidimensional item response theory (IRT) applications. Extending the recursive algorithm to handle multidimensionality is relatively simple, especially with fixed quadrature because the recursions can be defined on a grid formed by direct products of quadrature points. However, the increase in computational burden remains exponential in the number of dimensions, making the implementation of the recursive algorithm cumbersome for truly high-dimensional models. In this paper, a dimension reduction method that is specific to the Lord-Wingersky recursions is developed. This method can take advantage of the restrictions implied by hierarchical item factor models, e.g., the bifactor model, the testlet model, or the two-tier model, such that a version of the Lord-Wingersky recursive algorithm can operate on a dramatically reduced set of quadrature points. For instance, in a bifactor model, the dimension of integration is always equal to 2, regardless of the number of factors. The new algorithm not only provides an effective mechanism to produce summed score to IRT scaled score translation tables properly adjusted for residual dependence, but leads to new applications in test scoring, linking, and model fit checking as well. Simulated and empirical examples are used to illustrate the new applications.
Matrix Sampling of Items in Large-Scale Assessments

Directory of Open Access Journals (Sweden)

Ruth A. Childs

2003-07-01

Full Text Available Matrix sampling of items -' that is, division of a set of items into different versions of a test form..-' is used by several large-scale testing programs. Like other test designs, matrixed designs have..both advantages and disadvantages. For example, testing time per student is less than if each..student received all the items, but the comparability of student scores may decrease. Also,..curriculum coverage is maintained, but reporting of scores becomes more complex. In this paper,..matrixed designs are compared with more traditional designs in nine categories of costs:..development costs, materials costs, administration costs, educational costs, scoring costs,..reliability costs, comparability costs, validity costs, and reporting costs. In choosing among test..designs, a testing program should examine the costs in light of its mandate(s, the content of the..tests, and the financial resources available, among other considerations.
Sharing the cost of redundant items

DEFF Research Database (Denmark)

Hougaard, Jens Leth; Moulin, Hervé

2014-01-01

We ask how to share the cost of finitely many public goods (items) among users with different needs: some smaller subsets of items are enough to serve the needs of each user, yet the cost of all items must be covered, even if this entails inefficiently paying for redundant items. Typical examples...... are network connectivity problems when an existing (possibly inefficient) network must be maintained. We axiomatize a family cost ratios based on simple liability indices, one for each agent and for each item, measuring the relative worth of this item across agents, and generating cost allocation rules...... additive in costs....
Research on Generating Method of Embedded Software Test Document Based on Dynamic Model

Science.gov (United States)

Qu, MingCheng; Wu, XiangHu; Tao, YongChao; Liu, Ying

2018-03-01

This paper provides a dynamic model-based test document generation method for embedded software that provides automatic generation of two documents: test requirements specification documentation and configuration item test documentation. This method enables dynamic test requirements to be implemented in dynamic models, enabling dynamic test demand tracking to be easily generated; able to automatically generate standardized, standardized test requirements and test documentation, improved document-related content inconsistency and lack of integrity And other issues, improve the efficiency.
Item analysis and evaluation in the examinations in the faculty of ...

African Journals Online (AJOL)

2014-11-05

Nov 5, 2014 ... Key words: Classical test theory, item analysis, item difficulty, item discrimination, item response theory, reliability ... the probability of answering an item correctly or of attaining ..... A Monte Carlo comparison of item and person.
Study protocol of psychometric properties of the Spanish translation of a competence test in evidence based practice: the Fresno test.

Science.gov (United States)

Argimon-Pallàs, Josep M; Flores-Mateo, Gemma; Jiménez-Villa, Josep; Pujol-Ribera, Enriqueta; Foz, Gonçal; Bundó-Vidiella, Magda; Juncosa, Sebastià; Fuentes-Bellido, Cruz M; Pérez-Rodríguez, Belén; Margalef-Pallarès, Francesc; Villafafila-Ferrero, Rosa; Forès-Garcia, Dolors; Roman-Martínez, Josep; Vilert-Garroga, Esther

2009-02-24

There are few high-quality instruments for evaluating the effectiveness of Evidence-Based Practice (EBP) curricula with objective outcomes measures. The Fresno test is an instrument that evaluates most of EBP steps with a high reliability and validity in the English original version. The present study has the aims to translate the Fresno questionnaire into Spanish and its subsequent validation to ensure the equivalence of the Spanish version against the English original. The questionnaire will be translated with the back translation technique and tested in Primary Care Teaching Units in Catalonia (PCTU). Participants will be: (a) tutors of Family Medicine residents (expert group); (b) Family Medicine residents in their second year of the Family Medicine training program (novice group), and (c) Family Medicine physicians (intermediate group). The questionnaire will be administered before and after an educational intervention. The educational intervention will be an interactive four half-day sessions designed to develop the knowledge and skills required to EBP. Responsiveness statistics used in the analysis will be the effect size, the standardised response mean and Guyatt's method. For internal consistency reliability, two measures will be used: corrected item-total correlations and Cronbach's alpha. Inter-rater reliability will be tested using Kappa coefficient for qualitative items and intra-class correlation coefficient for quantitative items and the overall score. Construct validity, item difficulty, item discrimination and feasibility will be determined. The validation of the Fresno questionnaire into different languages will enable the expansion of the questionnaire, as well as allowing comparison between countries and the evaluation of different teaching models.
Building an Evaluation Scale using Item Response Theory.

Science.gov (United States)

Lalor, John P; Wu, Hao; Yu, Hong

2016-11-01

Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.
Measuring psychological trauma after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Psychological Trauma item bank and short form.

Science.gov (United States)

Kisala, Pamela A; Victorson, David; Pace, Natalie; Heinemann, Allen W; Choi, Seung W; Tulsky, David S

2015-05-01

To describe the development and psychometric properties of the SCI-QOL Psychological Trauma item bank and short form. Using a mixed-methods design, we developed and tested a Psychological Trauma item bank with patient and provider focus groups, cognitive interviews, and item response theory based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. We tested a 31-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Veterans Administration hospital. A total of 716 individuals with SCI completed the trauma items The 31 items fit a unidimensional model (CFI=0.952; RMSEA=0.061) and demonstrated good precision (theta range between 0.6 and 2.5). Nine items demonstrated negligible DIF with little impact on score estimates. The final calibrated item bank contains 19 items The SCI-QOL Psychological Trauma item bank is a psychometrically robust measurement tool from which a short form and a computer adaptive test (CAT) version are available.
Evaluating the quality of medical multiple-choice items created with automated processes.

Science.gov (United States)

Gierl, Mark J; Lai, Hollis

2013-07-01

Computerised assessment raises formidable challenges because it requires large numbers of test items. Automatic item generation (AIG) can help address this test development problem because it yields large numbers of new items both quickly and efficiently. To date, however, the quality of the items produced using a generative approach has not been evaluated. The purpose of this study was to determine whether automatic processes yield items that meet standards of quality that are appropriate for medical testing. Quality was evaluated firstly by subjecting items created using both AIG and traditional processes to rating by a four-member expert medical panel using indicators of multiple-choice item quality, and secondly by asking the panellists to identify which items were developed using AIG in a blind review. Fifteen items from the domain of therapeutics were created in three different experimental test development conditions. The first 15 items were created by content specialists using traditional test development methods (Group 1 Traditional). The second 15 items were created by the same content specialists using AIG methods (Group 1 AIG). The third 15 items were created by a new group of content specialists using traditional methods (Group 2 Traditional). These 45 items were then evaluated for quality by a four-member panel of medical experts and were subsequently categorised as either Traditional or AIG items. Three outcomes were reported: (i) the items produced using traditional and AIG processes were comparable on seven of eight indicators of multiple-choice item quality; (ii) AIG items can be differentiated from Traditional items by the quality of their distractors, and (iii) the overall predictive accuracy of the four expert medical panellists was 42%. Items generated by AIG methods are, for the most part, equivalent to traditionally developed items from the perspective of expert medical reviewers. While the AIG method produced comparatively fewer plausible
A scale purification procedure for evaluation of differential item functioning

NARCIS (Netherlands)

Khalid, Muhammad Naveed; Glas, Cornelis A.W.

2014-01-01

Item bias or differential item functioning (DIF) has an important impact on the fairness of psychological and educational testing. In this paper, DIF is seen as a lack of fit to an item response (IRT) model. Inferences about the presence and importance of DIF require a process of so-called test
Why Did Socrates Deny That He Was a Teacher? Locating Socrates among the New Educators and the Traditional Education in Plato's "Apology of Socrates"

Science.gov (United States)

Mintz, Avi I.

2014-01-01

Plato's "Apology of Socrates" contains a spirited account of Socrates' relationship with the city of Athens and its citizens. As Socrates stands on trial for corrupting the youth, surprisingly, he does not defend the substance and the methods of his teaching. Instead, he simply denies that he is a teacher. Many scholars have…
Q-Matrix Optimization Based on the Linear Logistic Test Model.

Science.gov (United States)

Ma, Lin; Green, Kelly E

This study explored optimization of item-attribute matrices with the linear logistic test model (Fischer, 1973), with optimal models explaining more variance in item difficulty due to identified item attributes. Data were 8th-grade mathematics test item responses of two TIMSS 2007 booklets. The study investigated three categories of attributes (content, cognitive process, and comprehensive cognitive process) at two grain levels (larger, smaller) and also compared results with random attribute matrices. The proposed attributes accounted for most of the variance in item difficulty for two assessment booklets (81% and 65%). The variance explained by the content attributes was very small (13% to 31%), less than variance explained by the comprehensive cognitive process attributes which explained much more variance than the content and cognitive process attributes. The variances explained by the grain level were similar to each other. However, the attributes did not predict the item difficulties of two assessment booklets equally.

The Dutch-Flemish PROMIS Physical Function item bank exhibited strong psychometric properties in patients with chronic pain.

Science.gov (United States)

Crins, Martine H P; Terwee, Caroline B; Klausch, Thomas; Smits, Niels; de Vet, Henrica C W; Westhovens, Rene; Cella, David; Cook, Karon F; Revicki, Dennis A; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Roorda, Leo D

2017-07-01

The objective of this study was to assess the psychometric properties of the Dutch-Flemish Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank in Dutch patients with chronic pain. A bank of 121 items was administered to 1,247 Dutch patients with chronic pain. Unidimensionality was assessed by fitting a one-factor confirmatory factor analysis and evaluating resulting fit statistics. Items were calibrated with the graded response model and its fit was evaluated. Cross-cultural validity was assessed by testing items for differential item functioning (DIF) based on language (Dutch vs. English). Construct validity was evaluated by calculation correlations between scores on the Dutch-Flemish PROMIS Physical Function measure and scores on generic and disease-specific measures. Results supported the Dutch-Flemish PROMIS Physical Function item bank's unidimensionality (Comparative Fit Index = 0.976, Tucker Lewis Index = 0.976) and model fit. Item thresholds targeted a wide range of physical function construct (threshold-parameters range: -4.2 to 5.6). Cross-cultural validity was good as four items only showed DIF for language and their impact on item scores was minimal. Physical Function scores were strongly associated with scores on all other measures (all correlations ≤ -0.60 as expected). The Dutch-Flemish PROMIS Physical Function item bank exhibited good psychometric properties. Development of a computer adaptive test based on the large bank is warranted. Copyright © 2017 Elsevier Inc. All rights reserved.
Verification of Differential Item Functioning (DIF) Status of West ...

African Journals Online (AJOL)

This study investigated test item bias and Differential Item Functioning (DIF) of West African ... items in chemistry function differentially with respect to gender and location. In Aba education zone of Abia, 50 secondary schools were purposively ...
What is good sport: Plato's view Co je to dobrý sport: Platónův pohled

Directory of Open Access Journals (Sweden)

Jernej Pisk

2006-02-01

Full Text Available One of Plato's most common questions found in his dialogues is "What is something?" By asking this question Plato usually brought his co-speakers to the recognition that in fact they do not have a full comprehension of what something is, although they have a partial comprehension of it. The awareness of one's incomplete cognition is the first step to be made on the philosophic way to truth. As in ancient times also today Plato asks us – the modern philosophers of sport – "What is sport?" or more precisely "What is good sport?" Probably the best of Plato's answers to this question can be found in the basic concepts of his philosophy regarding his hierarchical division of the state and human soul into three parts. Since sport is derived from human being also the goodness of sport can be divided into three stages. The lowest stage of sport corresponds to the first part of the soul – the appetite soul. On this stage sport is based on the gaining of material goods through prizes won at competitions. In the philosophic view, this is the lowest possible stage of goodness of sport. The second stage of sport corresponds to the second part of the soul – the emotional soul. Sport at this stage is based on the elementary ancient agon, which seeks fulfilment in the winning of honour and glory. The greatest and the most superior is the third part of the soul – the reasonable soul. According to this, also the sport corresponding to the third part of the soul is the best. For this kind of sport it is no longer necessary to compete with other contestants, since it can achieve it's fulfilment in perfect execution of movement or exercise, in which the perfect cooperation between reason (soul and body is attained. At this stage of sport it is the most important to compete and win over one's self, and this can be achieved by everyone, without regard to his/her physical abilities in comparison with others. In Plato's view, good sport is the sport directed
Entry-Item-Quantity-ABC Analysis-Based Multitype Cigarette Fast Sorting System

Directory of Open Access Journals (Sweden)

Ying Zhao

2012-01-01

Full Text Available Numerous items, small order, and frequent delivery are the characteristics of many distribution centers. Such characteristics generally increase the operating costs of the distribution center. To remedy this problem, this study employs the Entry-Item-Quantity (EIQ method to identify the characteristic of the cigarette distribution center and further analyzes the importance degree of customers and the frequently ordered products by means of EQ/EN/IQ-B/IK statistic charts. Based on these analyses as well as the total replenishment cost optimization model, multipicking strategies and combined multitype picking equipment allocation is then formulated accordingly. With such design scheme, the cigarette picking costs of the distribution center are expected to reduce. Finally, the specific number of equipment is figured out in order to meet the capability demand of the case cigarette distribution center.
Analyzing force concept inventory with item response theory

Science.gov (United States)

Wang, Jing; Bao, Lei

2010-10-01

Item response theory is a popular assessment method used in education. It rests on the assumption of a probability framework that relates students' innate ability and their performance on test questions. Item response theory transforms students' raw test scores into a scaled proficiency score, which can be used to compare results obtained with different test questions. The scaled score also addresses the issues of ceiling effects and guessing, which commonly exist in quantitative assessment. We used item response theory to analyze the force concept inventory (FCI). Our results show that item response theory can be useful for analyzing physics concept surveys such as the FCI and produces results about the individual questions and student performance that are beyond the capability of classical statistics. The theory yields detailed measurement parameters regarding the difficulty, discrimination features, and probability of correct guess for each of the FCI questions.
Constructing the 32-item Fitness-to-Drive Screening Measure.

Science.gov (United States)

Medhizadah, Shabnam; Classen, Sherrilene; Johnson, Andrew M

2018-04-01

The Fitness-to-Drive Screening Measure © (FTDS) enables proxies to identify at-risk older drivers via 54 driving-related items, but may be too lengthy for widespread uptake. We reduced the number of items in the FTDS and validated the shorter measure, using 200 caregiver responses. Exploratory factor analysis and classical test theory techniques were used to determine the most interpretable factor model and the minimum number of items to be used for predicting fitness to drive. The extent to which the shorter FTDS predicted the results of the 54-item FTDS was evaluated through correlational analysis. A three-factor model best represented the empirical data. Classical test theory techniques lead to the development of the 32-item FTDS. The 32-item FTDS was highly correlated ( r = .99, p = .05) with the FTDS. The 32-item FTDS may provide raters with a faster and more efficient way to identify at-risk older drivers.
Exploring problem solving strategies on multiple-choice science items: Comparing native Spanish-speaking English Language Learners and mainstream monolinguals

Science.gov (United States)

Kachchaf, Rachel Rae

The purpose of this study was to compare how English language learners (ELLs) and monolingual English speakers solved multiple-choice items administered with and without a new form of testing accommodation---vignette illustration (VI). By incorporating theories from second language acquisition, bilingualism, and sociolinguistics, this study was able to gain more accurate and comprehensive input into the ways students interacted with items. This mixed methods study used verbal protocols to elicit the thinking processes of thirty-six native Spanish-speaking English language learners (ELLs), and 36 native-English speaking non-ELLs when solving multiple-choice science items. Results from both qualitative and quantitative analyses show that ELLs used a wider variety of actions oriented to making sense of the items than non-ELLs. In contrast, non-ELLs used more problem solving strategies than ELLs. There were no statistically significant differences in student performance based on the interaction of presence of illustration and linguistic status or the main effect of presence of illustration. However, there were significant differences based on the main effect of linguistic status. An interaction between the characteristics of the students, the items, and the illustrations indicates considerable heterogeneity in the ways in which students from both linguistic groups think about and respond to science test items. The results of this study speak to the need for more research involving ELLs in the process of test development to create test items that do not require ELLs to carry out significantly more actions to make sense of the item than monolingual students.
Effect of individual thinking styles on item selection during study time allocation.

Science.gov (United States)

Jia, Xiaoyu; Li, Weijian; Cao, Liren; Li, Ping; Shi, Meiling; Wang, Jingjing; Cao, Wei; Li, Xinyu

2018-04-01

The influence of individual differences on learners' study time allocation has been emphasised in recent studies; however, little is known about the role of individual thinking styles (analytical versus intuitive). In the present study, we explored the influence of individual thinking styles on learners' application of agenda-based and habitual processes when selecting the first item during a study-time allocation task. A 3-item cognitive reflection test (CRT) was used to determine individuals' degree of cognitive reliance on intuitive versus analytical cognitive processing. Significant correlations between CRT scores and the choices of first item selection were observed in both Experiment 1a (study time was 5 seconds per triplet) and Experiment 1b (study time was 20 seconds per triplet). Furthermore, analytical decision makers constructed a value-based agenda (prioritised high-reward items), whereas intuitive decision makers relied more upon habitual responding (selected items from the leftmost of the array). The findings of Experiment 1a were replicated in Experiment 2 notwithstanding ruling out the possible effects from individual intelligence and working memory capacity. Overall, the individual thinking style plays an important role on learners' study time allocation and the predictive ability of CRT is reliable in learners' item selection strategy. © 2016 International Union of Psychological Science.
Evaluating the healthiness of chain-restaurant menu items using crowdsourcing: a new method.

Science.gov (United States)

Lesser, Lenard I; Wu, Leslie; Matthiessen, Timothy B; Luft, Harold S

2017-01-01

To develop a technology-based method for evaluating the nutritional quality of chain-restaurant menus to increase the efficiency and lower the cost of large-scale data analysis of food items. Using a Modified Nutrient Profiling Index (MNPI), we assessed chain-restaurant items from the MenuStat database with a process involving three steps: (i) testing 'extreme' scores; (ii) crowdsourcing to analyse fruit, nut and vegetable (FNV) amounts; and (iii) analysis of the ambiguous items by a registered dietitian. In applying the approach to assess 22 422 foods, only 3566 could not be scored automatically based on MenuStat data and required further evaluation to determine healthiness. Items for which there was low agreement between trusted crowd workers, or where the FNV amount was estimated to be >40 %, were sent to a registered dietitian. Crowdsourcing was able to evaluate 3199, leaving only 367 to be reviewed by the registered dietitian. Overall, 7 % of items were categorized as healthy. The healthiest category was soups (26 % healthy), while desserts were the least healthy (2 % healthy). An algorithm incorporating crowdsourcing and a dietitian can quickly and efficiently analyse restaurant menus, allowing public health researchers to analyse the healthiness of menu items.
Extent of awareness and prevalence of adulteration in selected food items in rural Dehradun

Directory of Open Access Journals (Sweden)

Ashok Kumar Srivastava

2016-09-01

Full Text Available Background: Adulteration of food items is common phenomenon in India. It includes both willful adulteration to improve texture and quality of food items and supply of substandard food items. The usual outcomes is outbreak of food borne illness. Aims & Objectives: i To estimate the prevalence of food adulteration in selected food items ii the awareness of subjects regarding food adulteration act and iii their buying practices. Material and Methods: Samplesize:150 households was sampled, based on prevalence of adulteration to be around 50%, with 95% confidence interval and absolute allowable error of 10%. Sample household were drawn from the selected villages randomly. Pre-designed and pretested questionnaires was administered to fulfill the objectives and food items were tested using NICE food adulteration kit. Data were analyzed by numeral with percentage, Pearson’s correlation test and F test. Results: In 59.3% households, housewives purchased the food items for the house. The prevalence of adulteration ranged from 17.3% to 66.2% in selected food items. Loose product was purchased by 54.3%. The food labels on packed items was not read by 86.3%. Mean percentage of purity was highest among literates (57.3 ±12.3 than illiterates and those having primary education. Statistically significant F ratio was seen for mean percentage of purity and respondent’s literacy status. Conclusion: Adulterant is rampant in poor strata of society due to consumer’s illiteracy and lack of awareness towards food safety rules.
Using Reversed MFCC and IT-EM for Automatic Speaker Verification

Directory of Open Access Journals (Sweden)

Sheeraz Memon

2012-01-01

Full Text Available This paper proposes text independent automatic speaker verification system using IMFCC (Inverse/ Reverse Mel Frequency Coefficients and IT-EM (Information Theoretic Expectation Maximization. To perform speaker verification, feature extraction using Mel scale has been widely applied and has established better results. The IMFCC is based on inverse Mel-scale. The IMFCC effectively captures information available at the high frequency formants which is ignored by the MFCC. In this paper the fusion of MFCC and IMFCC at input level is proposed. GMMs (Gaussian Mixture Models based on EM (Expectation Maximization have been widely used for classification of text independent verification. However EM comes across the convergence issue. In this paper we use our proposed IT-EM which has faster convergence, to train speaker models. IT-EM uses information theory principles such as PDE (Parzen Density Estimation and KL (Kullback-Leibler divergence measure. IT-EM acclimatizes the weights, means and covariances, like EM. However, IT-EM process is not performed on feature vector sets but on a set of centroids obtained using IT (Information Theoretic metric. The IT-EM process at once diminishes divergence measure between PDE estimates of features distribution within a given class and the centroids distribution within the same class. The feature level fusion and IT-EM is tested for the task of speaker verification using NIST2001 and NIST2004. The experimental evaluation validates that MFCC/IMFCC has better results than the conventional delta/MFCC feature set. The MFCC/IMFCC feature vector size is also much smaller than the delta MFCC thus reducing the computational burden as well. IT-EM method also showed faster convergence, than the conventional EM method, and thus it leads to higher speaker recognition scores.
Study protocol of psychometric properties of the Spanish translation of a competence test in evidence based practice: The Fresno test

Directory of Open Access Journals (Sweden)

Villafafila-Ferrero Rosa

2009-02-01

Full Text Available Abstract Background There are few high-quality instruments for evaluating the effectiveness of Evidence-Based Practice (EBP curricula with objective outcomes measures. The Fresno test is an instrument that evaluates most of EBP steps with a high reliability and validity in the English original version. The present study has the aims to translate the Fresno questionnaire into Spanish and its subsequent validation to ensure the equivalence of the Spanish version against the English original. Methods and design The questionnaire will be translated with the back translation technique and tested in Primary Care Teaching Units in Catalonia (PCTU. Participants will be: (a tutors of Family Medicine residents (expert group; (b Family Medicine residents in their second year of the Family Medicine training program (novice group, and (c Family Medicine physicians (intermediate group. The questionnaire will be administered before and after an educational intervention. The educational intervention will be an interactive four half-day sessions designed to develop the knowledge and skills required to EBP. Responsiveness statistics used in the analysis will be the effect size, the standardised response mean and Guyatt's method. For internal consistency reliability, two measures will be used: corrected item-total correlations and Cronbach's alpha. Inter-rater reliability will be tested using Kappa coefficient for qualitative items and intra-class correlation coefficient for quantitative items and the overall score. Construct validity, item difficulty, item discrimination and feasibility will be determined. Discussion The validation of the Fresno questionnaire into different languages will enable the expansion of the questionnaire, as well as allowing comparison between countries and the evaluation of different teaching models.
Utilizing Response Time Distributions for Item Selection in CAT

Science.gov (United States)

Fan, Zhewen; Wang, Chun; Chang, Hua-Hua; Douglas, Jeffrey

2012-01-01

Traditional methods for item selection in computerized adaptive testing only focus on item information without taking into consideration the time required to answer an item. As a result, some examinees may receive a set of items that take a very long time to finish, and information is not accrued as efficiently as possible. The authors propose two…
Biological Science: An Ecological Approach. BSCS Green Version. Teacher's Resource Book and Test Item Bank. Sixth Edition.

Science.gov (United States)

Biological Sciences Curriculum Study, Colorado Springs.

This book consists of four sections: (1) "Supplemental Materials"; (2) "Supplemental Investigations"; (3) "Test Item Bank"; and (4) "Blackline Masters." The first section provides additional background material related to selected chapters and investigations in the student book. Included are a periodic table of the elements, genetics problems and…
Nickel and cobalt release from jewellery and metal clothing items in Korea.

Science.gov (United States)

Cheong, Seung Hyun; Choi, You Won; Choi, Hae Young; Byun, Ji Yeon

2014-01-01

In Korea, the prevalence of nickel allergy has shown a sharply increasing trend. Cobalt contact allergy is often associated with concomitant reactions to nickel, and is more common in Korea than in western countries. The aim of the present study was to investigate the prevalence of items that release nickel and cobalt on the Korean market. A total of 471 items that included 193 branded jewellery, 202 non-branded jewellery and 76 metal clothing items were sampled and studied with a dimethylglyoxime (DMG) test and a cobalt spot test to detect nickel and cobalt release, respectively. Nickel release was detected in 47.8% of the tested items. The positive rates in the DMG test were 12.4% for the branded jewellery, 70.8% for the non-branded jewellery, and 76.3% for the metal clothing items. Cobalt release was found in 6.2% of items. Among the types of jewellery, belts and hair pins showed higher positive rates in both the DMG test and the cobalt spot test. Our study shows that the prevalence of items that release nickel or cobalt among jewellery and metal clothing items is high in Korea. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Development of Rasch-based item banks for the assessment of work performance in patients with musculoskeletal diseases.

Science.gov (United States)

Mueller, Evelyn A; Bengel, Juergen; Wirtz, Markus A

2013-12-01

This study aimed to develop a self-description assessment instrument to measure work performance in patients with musculoskeletal diseases. In terms of the International Classification of Functioning, Disability and Health (ICF), work performance is defined as the degree of meeting the work demands (activities) at the actual workplace (environment). To account for the fact that work performance depends on the work demands of the job, we strived to develop item banks that allow a flexible use of item subgroups depending on the specific work demands of the patients' jobs. Item development included the collection of work tasks from literature and content validation through expert surveys and patient interviews. The resulting 122 items were answered by 621 patients with musculoskeletal diseases. Exploratory factor analysis to ascertain dimensionality and Rasch analysis (partial credit model) for each of the resulting dimensions were performed. Exploratory factor analysis resulted in four dimensions, and subsequent Rasch analysis led to the following item banks: 'impaired productivity' (15 items), 'impaired cognitive performance' (18), 'impaired coping with stress' (13) and 'impaired physical performance' (low physical workload 20 items, high physical workload 10 items). The item banks exhibited person separation indices (reliability) between 0.89 and 0.96. The assessment of work performance adds the activities component to the more commonly employed participation component of the ICF-model. The four item banks can be adapted to specific jobs where necessary without losing comparability of person measures, as the item banks are based on Rasch analysis.
Identification of metallic items that caused nickel dermatitis in Danish patients.

Science.gov (United States)

Thyssen, Jacob P; Menné, Torkil; Johansen, Jeanne D

2010-09-01

Nickel allergy is prevalent as assessed by epidemiological studies. In an attempt to further identify and characterize sources that may result in nickel allergy and dermatitis, we analysed items identified by nickel-allergic dermatitis patients as causative of nickel dermatitis by using the dimethylglyoxime (DMG) test. Dermatitis patients with nickel allergy of current relevance were identified over a 2-year period in a tertiary referral patch test centre. When possible, their work tools and personal items were examined with the DMG test. Among 95 nickel-allergic dermatitis patients, 70 (73.7%) had metallic items investigated for nickel release. A total of 151 items were investigated, and 66 (43.7%) gave positive DMG test reactions. Objects were nearly all purchased or acquired after the introduction of the EU Nickel Directive. Only one object had been inherited, and only two objects had been purchased outside of Denmark. DMG testing is valuable as a screening test for nickel release and should be used to identify relevant exposures in nickel-allergic patients. Mainly consumer items, but also work tools used in an occupational setting, released nickel in dermatitis patients. This study confirmed 'risk items' from previous studies, including mobile phones.
Comparison of examination grades using item response theory : a case study

NARCIS (Netherlands)

Korobko, O.B.

2007-01-01

In item response theory (IRT), mathematical models are applied to analyze data from tests and questionnaires used to measure abilities, proficiency, personality traits and attitudes. This thesis is concerned with comparison of subjects, students and schools based on average examination grades using
Parent Ratings of ADHD Symptoms: Generalized Partial Credit Model Analysis of Differential Item Functioning across Gender

Science.gov (United States)

Gomez, Rapson

2012-01-01

Objective: Generalized partial credit model, which is based on item response theory (IRT), was used to test differential item functioning (DIF) for the "Diagnostic and Statistical Manual of Mental Disorders" (4th ed.), inattention (IA), and hyperactivity/impulsivity (HI) symptoms across boys and girls. Method: To accomplish this, parents completed…
Measuring resilience after spinal cord injury: Development, validation and psychometric characteristics of the SCI-QOL Resilience item bank and short form.

Science.gov (United States)

Victorson, David; Tulsky, David S; Kisala, Pamela A; Kalpakjian, Claire Z; Weiland, Brian; Choi, Seung W

2015-05-01

To describe the development and psychometric properties of the Spinal Cord Injury--Quality of Life (SCI-QOL) Resilience item bank and short form. Using a mixed-methods design, we developed and tested a resilience item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory based analytic approaches, including tests of model fit and differential item functioning (DIF). We tested a 32-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs medical center. A total of 717 individuals with SCI completed the Resilience items. A unidimensional model was observed (CFI=0.968; RMSEA=0.074) and measurement precision was good (theta range between -3.1 and 0.9). Ten items were flagged for DIF, however, after examination of effect sizes we found this to be negligible with little practical impact on score estimates. The final calibrated item bank resulted in 21 retained items. This study indicates that the SCI-QOL Resilience item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available.

Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating

Science.gov (United States)

He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei

2013-01-01

Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
Defining surgical criteria for empty nose syndrome: Validation of the office-based cotton test and clinical interpretability of the validated Empty Nose Syndrome 6-Item Questionnaire.

Science.gov (United States)

Thamboo, Andrew; Velasquez, Nathalia; Habib, Al-Rahim R; Zarabanda, David; Paknezhad, Hassan; Nayak, Jayakar V

2017-08-01

The validated Empty Nose Syndrome 6-Item Questionnaire (ENS6Q) identifies empty nose syndrome (ENS) patients. The unvalidated cotton test assesses improvement in ENS-related symptoms. By first validating the cotton test using the ENS6Q, we define the minimal clinically important difference (MCID) score for the ENS6Q. Individual case-control study. Fifteen patients diagnosed with ENS and 18 controls with non-ENS sinonasal conditions underwent office cotton placement. Both groups completed ENS6Q testing in three conditions-precotton, cotton in situ, and postcotton-to measure the reproducibility of ENS6Q scoring. Participants also completed a five-item transition scale ranging from "much better" to "much worse" to rate subjective changes in nasal breathing with and without cotton placement. Mean changes for each transition point, and the ENS6Q MCID, were then calculated. In the precotton condition, significant differences (P < .001) in all ENS6Q questions between ENS and controls were noted. With cotton in situ, nearly all prior ENS6Q differences normalized between ENS and control patients. For ENS patients, the changes in the mean differences between the precotton and cotton in situ conditions compared to postcotton versus cotton in situ conditions were insignificant among individuals. Including all 33 participants, the mean change in the ENS6Q between the parameters "a little better" and "about the same" was 4.25 (standard deviation [SD] = 5.79) and -2.00 (SD = 3.70), giving an MCID of 6.25. Cotton testing is a validated office test to assess for ENS patients. Cotton testing also helped to determine the MCID of the ENS6Q, which is a 7-point change from the baseline ENS6Q score. 3b. Laryngoscope, 127:1746-1752, 2017. © 2017 The American Laryngological, Rhinological and Otological Society, Inc.
PENGEMBANGAN TES BERPIKIR KRITIS DENGAN PENDEKATAN ITEM RESPONSE THEORY

Directory of Open Access Journals (Sweden)

Fajrianthi Fajrianthi

2016-06-01

Full Text Available Penelitian ini bertujuan untuk menghasilkan sebuah alat ukur (tes berpikir kritis yang valid dan reliabel untuk digunakan, baik dalam lingkup pendidikan maupun kerja di Indonesia. Tahapan penelitian dilakukan berdasarkan tahap pengembangan tes menurut Hambleton dan Jones (1993. Kisi-kisi dan pembuatan butir didasarkan pada konsep dalam tes Watson-Glaser Critical Thinking Appraisal (WGCTA. Pada WGCTA, berpikir kritis terdiri dari lima dimensi yaitu Inference, Recognition Assumption, Deduction, Interpretation dan Evaluation of arguments. Uji coba tes dilakukan pada 1.453 peserta tes seleksi karyawan di Surabaya, Gresik, Tuban, Bojonegoro, Rembang. Data dikotomi dianalisis dengan menggunakan model IRT dengan dua parameter yaitu daya beda dan tingkat kesulitan butir. Analisis dilakukan dengan menggunakan program statistik Mplus versi 6.11 Sebelum melakukan analisis dengan IRT, dilakukan pengujian asumsi yaitu uji unidimensionalitas, independensi lokal dan Item Characteristic Curve (ICC. Hasil analisis terhadap 68 butir menghasilkan 15 butir dengan daya beda yang cukup baik dan tingkat kesulitan butir yang berkisar antara –4 sampai dengan 2.448. Sedikitnya jumlah butir yang berkualitas baik disebabkan oleh kelemahan dalam menentukan subject matter experts di bidang berpikir kritis dan pemilihan metode skoring. Kata kunci: Pengembangan tes, berpikir kritis, item response theory DEVELOPING CRITICAL THINKING TEST UTILISING ITEM RESPONSE THEORY Abstract The present study was aimed to develop a valid and reliable instrument in assesing critical thinking which can be implemented both in educational and work settings in Indonesia. Following the Hambleton and Jones’s (1993 procedures on test development, the study developed the instrument by employing the concept of critical thinking from Watson-Glaser Critical Thinking Appraisal (WGCTA. The study included five dimensions of critical thinking as adopted from the WGCTA: Inference, Recognition
Spare Items validation

International Nuclear Information System (INIS)

Fernandez Carratala, L.

1998-01-01

There is an increasing difficulty for purchasing safety related spare items, with certifications by manufacturers for maintaining the original qualifications of the equipment of destination. The main reasons are, on the top of the logical evolution of technology, applied to the new manufactured components, the quitting of nuclear specific production lines and the evolution of manufacturers quality systems, originally based on nuclear codes and standards, to conventional industry standards. To face this problem, for many years different Dedication processes have been implemented to verify whether a commercial grade element is acceptable to be used in safety related applications. In the same way, due to our particular position regarding the spare part supplies, mainly from markets others than the american, C.N. Trillo has developed a methodology called Spare Items Validation. This methodology, which is originally based on dedication processes, is not a single process but a group of coordinated processes involving engineering, quality and management activities. These are to be performed on the spare item itself, its design control, its fabrication and its supply for allowing its use in destinations with specific requirements. The scope of application is not only focussed on safety related items, but also to complex design, high cost or plant reliability related components. The implementation in C.N. Trillo has been mainly curried out by merging, modifying and making the most of processes and activities which were already being performed in the company. (Author)
A Multidimensional Partial Credit Model with Associated Item and Test Statistics: An Application to Mixed-Format Tests

Science.gov (United States)

Yao, Lihua; Schwarz, Richard D.

2006-01-01

Multidimensional item response theory (IRT) models have been proposed for better understanding the dimensional structure of data or to define diagnostic profiles of student learning. A compensatory multidimensional two-parameter partial credit model (M-2PPC) for constructed-response items is presented that is a generalization of those proposed to…
Using Item Response Theory to Describe the Nonverbal Literacy Assessment (NVLA)

Science.gov (United States)

Fleming, Danielle; Wilson, Mark; Ahlgrim-Delzell, Lynn

2018-01-01

The Nonverbal Literacy Assessment (NVLA) is a literacy assessment designed for students with significant intellectual disabilities. The 218-item test was initially examined using confirmatory factor analysis. This method showed that the test worked as expected, but the items loaded onto a single factor. This article uses item response theory to…
Smart device-based testing for medical students in Korea: satisfaction, convenience, and advantages

Directory of Open Access Journals (Sweden)

Eun Young Lim

2017-04-01

Full Text Available The aim of this study was to investigate respondents’ satisfaction with smart device-based testing (SBT, as well as its convenience and advantages, in order to improve its implementation. The survey was conducted among 108 junior medical students at Kyungpook National University School of Medicine, Korea, who took a practice licensing examination using SBT in September 2015. The survey contained 28 items scored using a 5-point Likert scale. The items were divided into the following three categories: satisfaction with SBT administration, convenience of SBT features, and advantages of SBT compared to paper-and-pencil testing or computer-based testing. The reliability of the survey was 0.95. Of the three categories, the convenience of the SBT features received the highest mean (M score (M= 3.75, standard deviation [SD]= 0.69, while the category of satisfaction with SBT received the lowest (M= 3.13, SD= 1.07. No statistically significant differences across these categories with respect to sex, age, or experience were observed. These results indicate that SBT was practical and effective to take and to administer.
From Pericles to Plato – from democratic political praxis to totalitarian political philosophy

Directory of Open Access Journals (Sweden)

Øjvind Larsen

2012-03-01

Full Text Available Plato is normally taken as one of the founders of Western political philosophy, not at least with his Republic. Here, he constructs a hierarchy of forms of governments, beginning with aristocracy at the top as a critical standard for the other forms of governments, and proceeding through timocracy and oligarchy to democracy and tyranny at the bottom. Following Karl Popper, the paper argues that Plato’s is a totalitarian philosophy that emphasizes the similarities between democracy and tyranny, which it considers to be the two worst forms of government. Plato’s denigration of democracy has dominated the tradition of political philosophy until recent times. This paper, however, shows that political philosophy in fact originates in democracy, especially as developed by the sophists and that philosophy is only a form of sophism with a similar origin in ancient Greek democracy. A discussion of Pericles’ funeral oration is used to show that Pericles presented a democratic political philosophy that can serve as a counterpoint to Plato’s political philosophy in the Republic.
Examining Construct Congruence for Psychometric Tests: A Note on an Extension to Binary Items and Nesting Effects

Science.gov (United States)

Raykov, Tenko; Marcoulides, George A.; Dimitrov, Dimiter M.; Li, Tatyana

2018-01-01

This article extends the procedure outlined in the article by Raykov, Marcoulides, and Tong for testing congruence of latent constructs to the setting of binary items and clustering effects. In this widely used setting in contemporary educational and psychological research, the method can be used to examine if two or more homogeneous…
Negative effects of item repetition on source memory

OpenAIRE

Kim, Kyungmi; Yi, Do-Joon; Raye, Carol L.; Johnson, Marcia K.

2012-01-01

In the present study, we explored how item repetition affects source memory for new item–feature associations (picture–location or picture–color). We presented line drawings varying numbers of times in Phase 1. In Phase 2, each drawing was presented once with a critical new feature. In Phase 3, we tested memory for the new source feature of each item from Phase 2. Experiments 1 and 2 demonstrated and replicated the negative effects of item repetition on incidental source memory. Prior item re...
Editorial Changes and Item Performance: Implications for Calibration and Pretesting

Directory of Open Access Journals (Sweden)

Heather Stoffel

2014-11-01

Full Text Available Previous research on the impact of text and formatting changes on test-item performance has produced mixed results. This matter is important because it is generally acknowledged that any change to an item requires that it be recalibrated. The present study investigated the effects of seven classes of stylistic changes on item difficulty, discrimination, and response time for a subset of 65 items that make up a standardized test for physician licensure completed by 31,918 examinees in 2012. One of two versions of each item (original or revised was randomly assigned to examinees such that each examinee saw only two experimental items, with each item being administered to approximately 480 examinees. The stylistic changes had little or no effect on item difficulty or discrimination; however, one class of edits -' changing an item from an open lead-in (incomplete statement to a closed lead-in (direct question -' did result in slightly longer response times. Data for nonnative speakers of English were analyzed separately with nearly identical results. These findings have implications for the conventional practice of repretesting (or recalibrating items that have been subjected to minor editorial changes.
Extended shadow test approach for constrained adaptive testing

NARCIS (Netherlands)

Veldkamp, Bernard P.; Ariel, A.

2002-01-01

Several methods have been developed for use on constrained adaptive testing. Item pool partitioning, multistage testing, and testlet-based adaptive testing are methods that perform well for specific cases of adaptive testing. The weighted deviation model and the Shadow Test approach can be more
More is not Always Better: The Relation between Item Response and Item Response Time in Raven’s Matrices

Directory of Open Access Journals (Sweden)

Frank Goldhammer

2015-03-01

Full Text Available The role of response time in completing an item can have very different interpretations. Responding more slowly could be positively related to success as the item is answered more carefully. However, the association may be negative if working faster indicates higher ability. The objective of this study was to clarify the validity of each assumption for reasoning items considering the mode of processing. A total of 230 persons completed a computerized version of Raven’s Advanced Progressive Matrices test. Results revealed that response time overall had a negative effect. However, this effect was moderated by items and persons. For easy items and able persons the effect was strongly negative, for difficult items and less able persons it was less negative or even positive. The number of rules involved in a matrix problem proved to explain item difficulty significantly. Most importantly, a positive interaction effect between the number of rules and item response time indicated that the response time effect became less negative with an increasing number of rules. Moreover, exploratory analyses suggested that the error type influenced the response time effect.
Performance on large-scale science tests: Item attributes that may impact achievement scores

Science.gov (United States)

Gordon, Janet Victoria

, characteristics of test items themselves and/or opportunities to learn. Suggestions for future research are made.
The 12-item World Health Organization Disability Assessment Schedule II (WHO-DAS II: a nonparametric item response analysis

Directory of Open Access Journals (Sweden)

Fernandez Ana

2010-05-01

Full Text Available Abstract Background Previous studies have analyzed the psychometric properties of the World Health Organization Disability Assessment Schedule II (WHO-DAS II using classical omnibus measures of scale quality. These analyses are sample dependent and do not model item responses as a function of the underlying trait level. The main objective of this study was to examine the effectiveness of the WHO-DAS II items and their options in discriminating between changes in the underlying disability level by means of item response analyses. We also explored differential item functioning (DIF in men and women. Methods The participants were 3615 adult general practice patients from 17 regions of Spain, with a first diagnosed major depressive episode. The 12-item WHO-DAS II was administered by the general practitioners during the consultation. We used a non-parametric item response method (Kernel-Smoothing implemented with the TestGraf software to examine the effectiveness of each item (item characteristic curves and their options (option characteristic curves in discriminating between changes in the underliying disability level. We examined composite DIF to know whether women had a higher probability than men of endorsing each item. Results Item response analyses indicated that the twelve items forming the WHO-DAS II perform very well. All items were determined to provide good discrimination across varying standardized levels of the trait. The items also had option characteristic curves that showed good discrimination, given that each increasing option became more likely than the previous as a function of increasing trait level. No gender-related DIF was found on any of the items. Conclusions All WHO-DAS II items were very good at assessing overall disability. Our results supported the appropriateness of the weights assigned to response option categories and showed an absence of gender differences in item functioning.
Combining item and bulk material loss-detection uncertainties

International Nuclear Information System (INIS)

Eggers, R.F.

1982-01-01

Loss detection requirements, such as five formula kilograms with 99% probability of detection, which apply to the sum of losses from material in both item and bulk form, constitute a special problem for the nuclear material statistician. Requirements of this type are included in the Material Control and Accounting Reform Amendments described in the Advance Notice of Proposed Rule Making (Federal Register, 46(175):45144-46151). Attribute test sampling of items is the method used to detect gross defects in the inventory of items in a given control unit. Attribute sampling plans are designed to detect a loss of a specificed goal quantity of material with a given probability. In contrast to the methods and statistical models used for item loss detection, bulk material loss detection requires all the material entering and leaving a control unit to be measured and the calculation of a loss estimator that will be tested against an appropriate alarm threshold. The alarm threshold is determined from an estimate of the error inherent in the components of the loss estimator. In this paper a simple grahical method of evaluating the combined capabilities of bulk material loss detection methods and item attribute testing procedures will be described. Quantitative results will be given for several cases, indicating how a decrease in the precision of the item loss detection method tends to force an increase in the precision of the bulk loss detection procedure in order to meet the overall detection requirement. 4 figures
A signal detection-item response theory model for evaluating neuropsychological measures.

Science.gov (United States)

Thomas, Michael L; Brown, Gregory G; Gur, Ruben C; Moore, Tyler M; Patt, Virginie M; Risbrough, Victoria B; Baker, Dewleen G

2018-02-05

Models from signal detection theory are commonly used to score neuropsychological test data, especially tests of recognition memory. Here we show that certain item response theory models can be formulated as signal detection theory models, thus linking two complementary but distinct methodologies. We then use the approach to evaluate the validity (construct representation) of commonly used research measures, demonstrate the impact of conditional error on neuropsychological outcomes, and evaluate measurement bias. Signal detection-item response theory (SD-IRT) models were fitted to recognition memory data for words, faces, and objects. The sample consisted of U.S. Infantry Marines and Navy Corpsmen participating in the Marine Resiliency Study. Data comprised item responses to the Penn Face Memory Test (PFMT; N = 1,338), Penn Word Memory Test (PWMT; N = 1,331), and Visual Object Learning Test (VOLT; N = 1,249), and self-report of past head injury with loss of consciousness. SD-IRT models adequately fitted recognition memory item data across all modalities. Error varied systematically with ability estimates, and distributions of residuals from the regression of memory discrimination onto self-report of past head injury were positively skewed towards regions of larger measurement error. Analyses of differential item functioning revealed little evidence of systematic bias by level of education. SD-IRT models benefit from the measurement rigor of item response theory-which permits the modeling of item difficulty and examinee ability-and from signal detection theory-which provides an interpretive framework encompassing the experimentally validated constructs of memory discrimination and response bias. We used this approach to validate the construct representation of commonly used research measures and to demonstrate how nonoptimized item parameters can lead to erroneous conclusions when interpreting neuropsychological test data. Future work might include the
An Analysis of Cross Racial Identity Scale Scores Using Classical Test Theory and Rasch Item Response Models

Science.gov (United States)

Sussman, Joshua; Beaujean, A. Alexander; Worrell, Frank C.; Watson, Stevie

2013-01-01

Item response models (IRMs) were used to analyze Cross Racial Identity Scale (CRIS) scores. Rasch analysis scores were compared with classical test theory (CTT) scores. The partial credit model demonstrated a high goodness of fit and correlations between Rasch and CTT scores ranged from 0.91 to 0.99. CRIS scores are supported by both methods.…
Reliability and validity of the Spanish version of the 10-item Connor-Davidson Resilience Scale (10-item CD-RISC in young adults

Directory of Open Access Journals (Sweden)

García-Campayo Javier

2011-08-01

Full Text Available Abstract Background The 10-item Connor-Davidson Resilience Scale (10-item CD-RISC is an instrument for measuring resilience that has shown good psychometric properties in its original version in English. The aim of this study was to evaluate the validity and reliability of the Spanish version of the 10-item CD-RISC in young adults and to verify whether it is structured in a single dimension as in the original English version. Findings Cross-sectional observational study including 681 university students ranging in age from 18 to 30 years. The number of latent factors in the 10 items of the scale was analyzed by exploratory factor analysis. Confirmatory factor analysis was used to verify whether a single factor underlies the 10 items of the scale as in the original version in English. The convergent validity was analyzed by testing whether the mean of the scores of the mental component of SF-12 (MCS and the quality of sleep as measured with the Pittsburgh Sleep Index (PSQI were higher in subjects with better levels of resilience. The internal consistency of the 10-item CD-RISC was estimated using the Cronbach α test and test-retest reliability was estimated with the intraclass correlation coefficient. The Cronbach α coefficient was 0.85 and the test-retest intraclass correlation coefficient was 0.71. The mean MCS score and the level of quality of sleep in both men and women were significantly worse in subjects with lower resilience scores. Conclusions The Spanish version of the 10-item CD-RISC showed good psychometric properties in young adults and thus can be used as a reliable and valid instrument for measuring resilience. Our study confirmed that a single factor underlies the resilience construct, as was the case of the original scale in English.
Recommended core items to assess e-cigarette use in population-based surveys.

Science.gov (United States)

Pearson, Jennifer L; Hitchman, Sara C; Brose, Leonie S; Bauld, Linda; Glasser, Allison M; Villanti, Andrea C; McNeill, Ann; Abrams, David B; Cohen, Joanna E

2018-05-01

A consistent approach using standardised items to assess e-cigarette use in both youth and adult populations will aid cross-survey and cross-national comparisons of the effect of e-cigarette (and tobacco) policies and improve our understanding of the population health impact of e-cigarette use. Focusing on adult behaviour, we propose a set of e-cigarette use items, discuss their utility and potential adaptation, and highlight e-cigarette constructs that researchers should avoid without further item development. Reliable and valid items will strengthen the emerging science and inform knowledge synthesis for policy-making. Building on informal discussions at a series of international meetings of 65 experts from 15 countries, the authors provide recommendations for assessing e-cigarette use behaviour, relative perceived harm, device type, presence of nicotine, flavours and reasons for use. We recommend items assessing eight core constructs: e-cigarette ever use, frequency of use and former daily use; relative perceived harm; device type; primary flavour preference; presence of nicotine; and primary reason for use. These items should be standardised or minimally adapted for the policy context and target population. Researchers should be prepared to update items as e-cigarette device characteristics change. A minimum set of e-cigarette items is proposed to encourage consensus around items to allow for cross-survey and cross-jurisdictional comparisons of e-cigarette use behaviour. These proposed items are a starting point. We recognise room for continued improvement, and welcome input from e-cigarette users and scientific colleagues. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

Investigation of Science Inquiry Items for Use on an Alternate Assessment Based on Modified Achievement Standards Using Cognitive Lab Methodology

Science.gov (United States)

Dickenson, Tammiee S.; Gilmore, Joanna A.; Price, Karen J.; Bennett, Heather L.

2013-01-01

This study evaluated the benefits of item enhancements applied to science-inquiry items for incorporation into an alternate assessment based on modified achievement standards for high school students. Six items were included in the cognitive lab sessions involving both students with and without disabilities. The enhancements (e.g., use of visuals,…
Development of a Postacute Hospital Item Bank for the New Pediatric Evaluation of Disability Inventory-Computer Adaptive Test

Science.gov (United States)

Dumas, Helene M.

2010-01-01

The PEDI-CAT is a new computer adaptive test (CAT) version of the Pediatric Evaluation of Disability Inventory (PEDI). Additional PEDI-CAT items specific to postacute pediatric hospital care were recently developed using expert reviews and cognitive interviewing techniques. Expert reviews established face and construct validity, providing positive…
The Long-Term Conditions Questionnaire: conceptual framework and item development.

Science.gov (United States)

Peters, Michele; Potter, Caroline M; Kelly, Laura; Hunter, Cheryl; Gibbons, Elizabeth; Jenkinson, Crispin; Coulter, Angela; Forder, Julien; Towers, Ann-Marie; A'Court, Christine; Fitzpatrick, Ray

2016-01-01

To identify the main issues of importance when living with long-term conditions to refine a conceptual framework for informing the item development of a patient-reported outcome measure for long-term conditions. Semi-structured qualitative interviews (n=48) were conducted with people living with at least one long-term condition. Participants were recruited through primary care. The interviews were transcribed verbatim and analyzed by thematic analysis. The analysis served to refine the conceptual framework, based on reviews of the literature and stakeholder consultations, for developing candidate items for a new measure for long-term conditions. Three main organizing concepts were identified: impact of long-term conditions, experience of services and support, and self-care. The findings helped to refine a conceptual framework, leading to the development of 23 items that represent issues of importance in long-term conditions. The 23 candidate items formed the first draft of the measure, currently named the Long-Term Conditions Questionnaire. The aim of this study was to refine the conceptual framework and develop items for a patient-reported outcome measure for long-term conditions, including single and multiple morbidities and physical and mental health conditions. Qualitative interviews identified the key themes for assessing outcomes in long-term conditions, and these underpinned the development of the initial draft of the measure. These initial items will undergo cognitive testing to refine the items prior to further validation in a survey.
Cardiovascular events in acute coronary syndrome patients with peripheral arterial disease treated with ticagrelor compared to clopidogrel: Data from the PLATO trials

DEFF Research Database (Denmark)

Patel, Manesh R.; Becker, Richard C.; Wojdyla, Daniel M.

Abstract 14299: Cardiovascular Events in Acute Coronary Syndrome Patients With Peripheral Arterial Disease Treated With Ticagrelor Compared to Clopidogrel: Data From the PLATO Trial Manesh R Patel1; Richard C Becker1; Daniel M Wojdyla2; Håkan Emanuelsson3; William Hiatt4; Jay Horrow5; Steen Husted6...... Uppsala, Sweden 10 Cardiology, Uppsala Clinical Rsch center, 75185 Uppsala, Sweden Background: Patients with peripheral artery disease (PAD) and acute coronary syndrome (ACS) are at high risk for clinical events and are often difficult to manage. We evaluated cardiovascular outcomes of ACS patients...
Promoting cold-start items in recommender systems.

Science.gov (United States)

Liu, Jin-Hu; Zhou, Tao; Zhang, Zi-Ke; Yang, Zimo; Liu, Chuang; Li, Wei-Min

2014-01-01

As one of the major challenges, cold-start problem plagues nearly all recommender systems. In particular, new items will be overlooked, impeding the development of new products online. Given limited resources, how to utilize the knowledge of recommender systems and design efficient marketing strategy for new items is extremely important. In this paper, we convert this ticklish issue into a clear mathematical problem based on a bipartite network representation. Under the most widely used algorithm in real e-commerce recommender systems, the so-called item-based collaborative filtering, we show that to simply push new items to active users is not a good strategy. Interestingly, experiments on real recommender systems indicate that to connect new items with some less active users will statistically yield better performance, namely, these new items will have more chance to appear in other users' recommendation lists. Further analysis suggests that the disassortative nature of recommender systems contributes to such observation. In a word, getting in-depth understanding on recommender systems could pave the way for the owners to popularize their cold-start products with low costs.
Promoting Cold-Start Items in Recommender Systems

Science.gov (United States)

Liu, Jin-Hu; Zhou, Tao; Zhang, Zi-Ke; Yang, Zimo; Liu, Chuang; Li, Wei-Min

2014-01-01

As one of the major challenges, cold-start problem plagues nearly all recommender systems. In particular, new items will be overlooked, impeding the development of new products online. Given limited resources, how to utilize the knowledge of recommender systems and design efficient marketing strategy for new items is extremely important. In this paper, we convert this ticklish issue into a clear mathematical problem based on a bipartite network representation. Under the most widely used algorithm in real e-commerce recommender systems, the so-called item-based collaborative filtering, we show that to simply push new items to active users is not a good strategy. Interestingly, experiments on real recommender systems indicate that to connect new items with some less active users will statistically yield better performance, namely, these new items will have more chance to appear in other users' recommendation lists. Further analysis suggests that the disassortative nature of recommender systems contributes to such observation. In a word, getting in-depth understanding on recommender systems could pave the way for the owners to popularize their cold-start products with low costs. PMID:25479013
Diabetes knowledge in nursing homes and home-based care services: a validation study of the Michigan Diabetes Knowledge Test adapted for use among nursing personnel.

Science.gov (United States)

Haugstvedt, Anne; Aarflot, Morten; Igland, Jannicke; Landbakk, Tilla; Graue, Marit

2016-01-01

Providing high-quality diabetes care in nursing homes and home-based care facilities requires suitable instruments to evaluate the level of diabetes knowledge among the health-care providers. Thus, the aim of this study was to examine the psychometric properties of the Michigan Diabetes Knowledge Test adapted for use among nursing personnel. The study included 127 nursing personnel (32 registered nurses, 69 nursing aides and 26 nursing assistants) at three nursing homes and one home-based care facility in Norway. We examined the reliability and content and construct validity of the Michigan Diabetes Knowledge Test. The items in both the general diabetes subscale and the insulin-use subscale were considered relevant and appropriate. The instrument showed satisfactory properties for distinguishing between groups. Item response theory-based measurements and item information curves indicate maximum information at average or lower knowledge scores. Internal consistency and the item-total correlations were quite weak, indicating that the Michigan Diabetes Knowledge Test measures a set of items related to various relevant knowledge topics but not necessarily related to each other. The Michigan Diabetes Knowledge Test measures a broad range of topics relevant to diabetes care. It is an appropriate instrument for identifying individual and distinct needs for diabetes education among nursing personnel. The knowledge gaps identified by the Michigan Diabetes Knowledge Test could also provide useful input for the content of educational activities. However, some revision of the test should be considered.
Better assessment of physical function: item improvement is neglected but essential.

Science.gov (United States)

Bruce, Bonnie; Fries, James F; Ambrosini, Debbie; Lingala, Bharathi; Gandek, Barbara; Rose, Matthias; Ware, John E

2009-01-01

Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models
Measuring self-esteem after spinal cord injury: Development, validation and psychometric characteristics of the SCI-QOL Self-esteem item bank and short form.

Science.gov (United States)

Kalpakjian, Claire Z; Tate, Denise G; Kisala, Pamela A; Tulsky, David S

2015-05-01

To describe the development and psychometric properties of the Spinal Cord Injury-Quality of Life (SCI-QOL) Self-esteem item bank. Using a mixed-methods design, we developed and tested a self-esteem item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory-(IRT) based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. We tested a pool of 30 items at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters/Bronx Department of Veterans Affairs hospital. A total of 717 individuals with SCI completed the self-esteem items. A unidimensional model was observed (CFI=0.946; RMSEA=0.087) and measurement precision was good (theta range between -2.7 and 0.7). Eleven items were flagged for DIF; however, effect sizes were negligible with little practical impact on score estimates. The final calibrated item bank resulted in 23 retained items. This study indicates that the SCI-QOL Self-esteem item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available.
Teste de Raciocínio Auditivo Musical (RAu: estudo inicial por meio da Teoria de Reposta ao Item Test de Raciocinio Auditivo Musical (RAu: estudio inicial a través de la Teoría de Repuesta al Ítem Auditory Musical Reasoning Test: an initial study with Item Response Theory

Directory of Open Access Journals (Sweden)

Fernando Pessotto

2012-12-01

Full Text Available A presente pesquisa tem como objetivo buscar evidências de validade com base na estrutura interna e de critério para um instrumento de avaliação do processamento auditivo das habilidades musicais (Teste de Processamento Auditivo com Estímulos Musicais, RAu. Para tanto, foram avaliadas 162 pessoas de ambos os sexos, sendo 56,8% homens, com faixa etária entre 15 e 59 anos (M=27,5; DP=9,01. Os participantes foram divididos entre músicos (N=24, amadores (N=62 e leigos (N=76, de acordo com o nível de conhecimento em música. Por meio da análise Full Information Factor Analysis, verificou-se a dimensionalidade do instrumento, e também as propriedades dos itens, por meio da Teoria de Resposta ao Item (TRI. Além disso, buscou-se identificar a capacidade de discriminação entre os grupos de músicos e não-músicos. Os dados encontrados apontam evidências de que os itens medem uma dimensão principal (alfa=0,92 com alta capacidade para diferenciar os grupos de músicos profissionais, amadores e leigos, obtendo-se um coeficiente de validade de critério de r=0,68. Os resultado indicam evidências positivas de precisão e validade para o RAu.La presente investigación tiene como objetivo buscar evidencias de validez basadas en la estructura interna y de criterio para un instrumento de evaluación del procesamiento auditivo de las habilidades musicales (Test de Procesamiento Auditivo con Estímulos Musicales, RAu. Para eso, fueron evaluadas 162 personas de ambos los sexos, siendo 56,8% hombres, con rango de edad entre 15 y 59 años (M=27,5; DP=9,01. Los participantes fueron divididos entre músicos (N=24, aficionados (N=62 y laicos (N=76 de acuerdo con el nivel de conocimiento en música. Por medio del análisis Full Information Factor Analysis se verificó la dimensionalidad del instrumento y también las propiedades de los ítems a través de la Teoría de Respuesta al Ítem (TRI. Además, se buscó identificar la capacidad de discriminaci
Testing ESL pragmatics development and validation of a web-based assessment battery

CERN Document Server

Roever, Carsten

2014-01-01

Although second language learners' pragmatic competence (their ability to use language in context) is an essential part of their general communicative competence, it has not been a part of second language tests. This book helps fill this gap by describing the development and validation of a web-based test of ESL pragmalinguistics. The instrument assesses learners' knowledge of routine formulae, speech acts, and implicature in 36 multiple-choice and brief-response items. The test's quantitative and qualitative validation with 300 learners showed high reliability and provided strong evidence of
Mokken scale analysis : Between the Guttman scale and parametric item response theory

NARCIS (Netherlands)

van Schuur, Wijbrandt H.

2003-01-01

This article introduces a model of ordinal unidimensional measurement known as Mokken scale analysis. Mokken scaling is based on principles of Item Response Theory (IRT) that originated in the Guttman scale. I compare the Mokken model with both Classical Test Theory (reliability or factor analysis)
The Blood Donor Anxiety Scale: a six-item state anxiety measure based on the Spielberger State-Trait Anxiety Inventory.

Science.gov (United States)

Chell, Kathleen; Waller, Daniel; Masser, Barbara

2016-06-01

Research demonstrates that anxiety elevates the risk of blood donors experiencing adverse events, which in turn deters the performance of repeat blood donations. Identifying donors suffering from heightened state anxiety is important to assess the impact of evidence-based interventions. This study analyzed the appropriateness of a shortened version of the state subscale of the State-Trait Anxiety Inventory (STAI) in a blood donation context. STAI-State questionnaire data were collected from two separate samples of Australian blood donors (n = 919 and n = 824 after cleaning). Responses to demographic, donation history, and adverse reaction questions were also obtained. Identification of items and analysis was performed systematically to assess and compare internal reliability and content, construct, convergent, and criterion validity of three potential short-form state anxiety scales. Of the three short-form scales tested, STAI-State six-item scale demonstrated the best metric properties with the least number of items across both sample groups. Cronbach's alpha was acceptable (α = 0.844 and α = 0.820), correlated positively with the original measure (r = 0.927 and r = 0.931) and criterion-related variables, and maintained the two-dimension factorial structure of the original measure. The six-item short version of the STAI-State subscale presented the most reliable and valid scale for use with blood donors. A validated donor anxiety tool provides a standardized assessment and record of donor anxiety to gauge the effectiveness of ongoing efforts to enhance the donation experience. © 2016 AABB.
Creating a Database for Test Items in National Examinations (pp ...

African Journals Online (AJOL)

Nekky Umera

different programmers create files and application programs over a long period. .... In theory or essay questions, alternative methods of solving problems are explored and ... Unworthy items are those that do not focus on the central concept or.
Limited information estimation of the diffusion-based item response theory model for responses and response times.

Science.gov (United States)

Ranger, Jochen; Kuhn, Jörg-Tobias; Szardenings, Carsten

2016-05-01

Psychological tests are usually analysed with item response models. Recently, some alternative measurement models have been proposed that were derived from cognitive process models developed in experimental psychology. These models consider the responses but also the response times of the test takers. Two such models are the Q-diffusion model and the D-diffusion model. Both models can be calibrated with the diffIRT package of the R statistical environment via marginal maximum likelihood (MML) estimation. In this manuscript, an alternative approach to model calibration is proposed. The approach is based on weighted least squares estimation and parallels the standard estimation approach in structural equation modelling. Estimates are determined by minimizing the discrepancy between the observed and the implied covariance matrix. The estimator is simple to implement, consistent, and asymptotically normally distributed. Least squares estimation also provides a test of model fit by comparing the observed and implied covariance matrix. The estimator and the test of model fit are evaluated in a simulation study. Although parameter recovery is good, the estimator is less efficient than the MML estimator. © 2016 The British Psychological Society.
Item information and discrimination functions for trinary PCM items

NARCIS (Netherlands)

Akkermans, Wies; Muraki, Eiji

1997-01-01

For trinary partial credit items the shape of the item information and the item discrimination function is examined in relation to the item parameters. In particular, it is shown that these functions are unimodal if δ2 – δ1 < 4 ln 2 and bimodal otherwise. The locations and values of the maxima are
Recogida y gestión de datos de respuesta en bancos de items: análisis de un sistema informático

Directory of Open Access Journals (Sweden)

J. Gabriel Molina Ibáñez

1998-02-01

Full Text Available Item banks play a basic role as the base to build computerized adaptive tests but, in practice, the development of item banks suppose an important effort and no much support of software exists for this task. This paper is centered on the item response data collection and management involved in the work with item banks, as part of the analysis of a computer program oriented to support the development of item banks
Factoring handedness data: I. Item analysis.

Science.gov (United States)

Messinger, H B; Messinger, M I

1995-12-01

Recently in this journal Peters and Murphy challenged the validity of factor analyses done on bimodal handedness data, suggesting instead that right- and left-handers be studied separately. But bimodality may be avoidable if attention is paid to Oldfield's questionnaire format and instructions for the subjects. Two characteristics appear crucial: a two-column LEFT-RIGHT format for the body of the instrument and what we call Oldfield's Admonition: not to indicate strong preference for handedness item, such as write, unless "... the preference is so strong that you would never try to use the other hand unless absolutely forced to...". Attaining unimodality of an item distribution would seem to overcome the objections of Peters and Murphy. In a 1984 survey in Boston we used Oldfield's ten-item questionnaire exactly as published. This produced unimodal item distributions. With reflection of the five-point item scale and a logarithmic transformation, we achieved a degree of normalization for the items. Two surveys elsewhere based on Oldfield's 20-item list but with changes in the questionnaire format and the instructions, yielded markedly different item distributions with peaks at each extreme and sometimes in the middle as well.
Item response theory analysis of the Pain Self-Efficacy Questionnaire.

Science.gov (United States)

Costa, Daniel S J; Asghari, Ali; Nicholas, Michael K

2017-01-01

The Pain Self-Efficacy Questionnaire (PSEQ) is a 10-item instrument designed to assess the extent to which a person in pain believes s/he is able to accomplish various activities despite their pain. There is strong evidence for the validity and reliability of both the full-length PSEQ and a 2-item version. The purpose of this study is to further examine the properties of the PSEQ using an item response theory (IRT) approach. We used the two-parameter graded response model to examine the category probability curves, and location and discrimination parameters of the 10 PSEQ items. In item response theory, responses to a set of items are assumed to be probabilistically determined by a latent (unobserved) variable. In the graded-response model specifically, item response threshold (the value of the latent variable for which adjacent response categories are equally likely) and discrimination parameters are estimated for each item. Participants were 1511 mixed, chronic pain patients attending for initial assessment at a tertiary pain management centre. All items except item 7 ('I can cope with my pain without medication') performed well in IRT analysis, and the category probability curves suggested that participants used the 7-point response scale consistently. Items 6 ('I can still do many of the things I enjoy doing, such as hobbies or leisure activity, despite pain'), 8 ('I can still accomplish most of my goals in life, despite the pain') and 9 ('I can live a normal lifestyle, despite the pain') captured higher levels of the latent variable with greater precision. The results from this IRT analysis add to the body of evidence based on classical test theory illustrating the strong psychometric properties of the PSEQ. Despite the relatively poor performance of Item 7, its clinical utility warrants its retention in the questionnaire. The strong psychometric properties of the PSEQ support its use as an effective tool for assessing self-efficacy in people with pain
Determination of radionuclides in environmental test items at CPHR: traceability and uncertainty calculation.

Science.gov (United States)

Carrazana González, J; Fernández, I M; Capote Ferrera, E; Rodríguez Castro, G

2008-11-01

Information about how the laboratory of Centro de Protección e Higiene de las Radiaciones (CPHR), Cuba establishes its traceability to the International System of Units for the measurement of radionuclides in environmental test items is presented. A comparison among different methodologies of uncertainty calculation, including an analysis of the feasibility of using the Kragten-spreadsheet approach, is shown. In the specific case of the gamma spectrometric assay, the influence of each parameter, and the identification of the major contributor, in the relative difference between the methods of uncertainty calculation (Kragten and partial derivative) is described. The reliability of the uncertainty calculation results reported by the commercial software Gamma 2000 from Silena is analyzed.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.