Open Access. Powered by Scholars. Published by Universities.®

Education Commons

Open Access. Powered by Scholars. Published by Universities.®

Educational Assessment, Evaluation, and Research

PDF

University of Massachusetts Amherst

Classical test theory

Articles 1 - 2 of 2

Full-Text Articles in Education

An Investigation Of Subtest Score Equating Methods Under Classical Test Theory And Item Response Theory Frameworks, Minjeong Shin Aug 2015

An Investigation Of Subtest Score Equating Methods Under Classical Test Theory And Item Response Theory Frameworks, Minjeong Shin

Doctoral Dissertations

Test scores are usually equated only at the total score level. If a test mainly measures a single trait, indicating that the test is essentially unidimensional, equating at the total score level could be the best choice. However, when a test is composed of subtests having negligible relationships among them, separate equating for each subtest offers the best choice. Given a moderate amount of correlations among the subtests, performing individual equating for each subtest may be misleading in that it ignores the relationship of the subtests. This study applied and compared several possible subtest score equating methods based on classical …


Evaluating Irt- And Ctt-Based Methods Of Estimating Classification Consistency And Accuracy Indices From Single Administrations, Nina Deng Sep 2011

Evaluating Irt- And Ctt-Based Methods Of Estimating Classification Consistency And Accuracy Indices From Single Administrations, Nina Deng

Open Access Dissertations

Three decision consistency and accuracy (DC/DA) methods, the Livingston and Lewis (LL) method, LEE method, and the Hambleton and Han (HH) method, were evaluated. The purposes of the study were (1) to evaluate the accuracy and robustness of these methods, especially when their assumptions were not well satisfied, (2) to investigate the " true" DC/DA indices in various conditions, and (3) to assess the impact of choice of reliability estimate on the LL method.

Four simulation studies were conducted. Study 1 looked at various test lengths. Study 2 focused on local item dependency (LID). Study 3 checked the consequences of …