Open Access. Powered by Scholars. Published by Universities.®

Education Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 7 of 7

Full-Text Articles in Education

A Comparison Study Of Irt Calibration Methods For Mixed-Format Tests In Vertical Scaling, Huijuan Meng Dec 2007

A Comparison Study Of Irt Calibration Methods For Mixed-Format Tests In Vertical Scaling, Huijuan Meng

Theses and Dissertations

The purpose of this dissertation is to investigate how using different IRT calibration methods may affect student achievement growth pattern recovery. In this study, 96 vertical scales (4 × 2 × 2 × 2 ×3) are constructed using different combinations of IRT calibration methods (separate, pair-wise concurrent, semi-concurrent, & concurrent), lengths of common-item set (10 vs. 20 common items), types of common item set (dichotomous only vs. dichotomous and polytomous), and numbers of polytomous item (6 vs. 12) for 3 simulated datasets which differ in sample size (500, 1000, 5000 per grade). Three criteria (RMSE, SE and bias) are used to evaluate the performance of these calibration methods on proficiency score distribution recovery over 40 replications. The results suggest that for data used in this study, when parameters of interest are related to measuring students' growth (i.e., proficiency score mean and effect size), pair-wise concurrent calibration overall produced the most accurate results. When parameters of interest are related to performance variability (i.e., standard deviation), concurrent calibration in general produced the most stable and accurate estimates. When the emphasis is to classify students' performance accurately, with the increase of sample size, taken collectively, pair-wise concurrent and semi-concurrent calibration outperformed concurrent and separate calibration. Overall, pair-wise concurrent was more effective than the other methods in constructing a vertical scale and use of either separate or concurrent calibration to create a vertical scale seems least warranted.

In addition, it is observed that (1) Larger sample size stabilized estimation results and reduced error; (2) Compared to tests containing 10 common items, errors and biases were in general smaller for tests with 20 common items; (3) Compared to tests containing a mixed-format common-item set, errors and biases were usually smaller for tests containing a dichotomous-only common-item set; (4) For tests containing a mixed-format common-item set, errors and biases were ...


Effectiveness Of The Hybrid Levine Equipercentile And Modified Frequency Estimation Equating Methods Under The Common-Item Nonequivalent Groups Design, Jianlin Hou Dec 2007

Effectiveness Of The Hybrid Levine Equipercentile And Modified Frequency Estimation Equating Methods Under The Common-Item Nonequivalent Groups Design, Jianlin Hou

Theses and Dissertations

The purpose of this study was to evaluate the effectiveness of the hybrid Levine equipercentile (Hybrid LE) and modified frequency estimation (MFE) equating methods in improving accuracy of equating as compared to the percentile rank frequency estimation (FE), kernel frequency estimation (Kernel FE) and percentile rank chained equipercentile (CE) equating methods under the common-item nonequivalent groups (CINEG) design. The methods were compared under a wide variety of simulated conditions with log-linear pre-smoothing. The simulated conditions reflected differences in sample size, group proficiency, test length, ratio of common items and the similarity of form difficulty. An item response theory (IRT) model ...


Pre-Service Teachers' Perspectives On Race: The Impact Of Key Experiences, Lanese Kwegyir Aggrey Jan 2007

Pre-Service Teachers' Perspectives On Race: The Impact Of Key Experiences, Lanese Kwegyir Aggrey

Theses and Dissertations

This study seeks insight into the experiences of pre-service teachers concerning issues of race prior to their embarking on a required multicultural education course. The study seeks to explore these questions: What kinds of racial knowledge or familiarity do pre-service teachers bring with them to their undergraduate classrooms? What key events have shaped their attitudes and their ability to talk about race? What concerns do they have about their ability to teach future students who are racially different from themselves?

This study was conducted using qualitative methods, with a series of questions posed to small focus groups in a university ...


The Development Of Undergraduate Students' Facility With Disciplinary Discourses Through Collaboration Between Faculty Members And Librarians, Michelle Holschuh Simmons Jan 2007

The Development Of Undergraduate Students' Facility With Disciplinary Discourses Through Collaboration Between Faculty Members And Librarians, Michelle Holschuh Simmons

Theses and Dissertations

In this study, I examine the ways in which undergraduate students acquire the discourses of their chosen major. In particular, I focus on the complementary contributions of faculty members and academic librarians in students' acquisition of disciplinary discourses. Grounded in genre theory and Gee's (1996) notion of primary and secondary discourses, the study highlights the complex processes that students undergo to acquire and internalize the discourse of an academic discipline.

Using a qualitative case-study approach, I consider the interrelated experiences of five undergraduate students, three faculty members, and two librarians at a small liberal arts college in the Midwest ...


A Comparison Of Calibration Methods And Proficiency Estimators For Creating Irt Vertical Scales, Jungnam Kim Jan 2007

A Comparison Of Calibration Methods And Proficiency Estimators For Creating Irt Vertical Scales, Jungnam Kim

Theses and Dissertations

The main purpose of this study was to construct different vertical scales based on various combinations of calibration methods and proficiency estimators to investigate the impact different choices may have on these properties of the vertical scales that result: grade-to-grade growth, grade-to-grade variability, and the separation of grade distributions. Calibration methods investigated were concurrent calibration, separate calibration, and fixed a, b, and c item parameters for common items with simple prior updates (FSPU). Proficiency estimators investigated were Maximum Likelihood Estimator (MLE) with pattern scores, Expected A Posteriori (EAP) with pattern scores, pseudo-MLE with summed scores, pseudo-EAP with summed scores, and ...


Dichotomous Or Polytomous Model? Equating Of Testlet-Based Tests In Light Of Conditional Item Pair Correlations, Jin Zhang Jan 2007

Dichotomous Or Polytomous Model? Equating Of Testlet-Based Tests In Light Of Conditional Item Pair Correlations, Jin Zhang

Theses and Dissertations

The performance of dichotomous and polytomous IRT models in equating testlet-based tests was compared in this study.

To clarify the conditions under which dichotomous and polytomous item response models produce differing results, the DIMTEST program was used for testing essential unidimensionality, and a bias-corrected index (Final Condcorr) was adapted in this study for measuring local item dependence (LID).

True score and observed score equating using either the three-parameter logistic or generalized partial credit model was conducted for three subtests of the Iowa Tests of Educational Development (ITED) subtests and seven simulated datatests. Two factors were manipulated in generating the simulated ...


An Investigation Of A Bayesian Decision-Theoretic Procedure In The Context Of Mastery Tests, Ming-Chuan Hsieh Jan 2007

An Investigation Of A Bayesian Decision-Theoretic Procedure In The Context Of Mastery Tests, Ming-Chuan Hsieh

Theses and Dissertations

The purpose of this study was to extend Glas and Vos's (1998) Bayesian procedure to the 3PL IRT model by using the MCMC method. In the context of fixed-length mastery tests, the Bayesian decision-theoretic procedure was compared with two conventional procedures (conventional- Proportion Correct and conventional- EAP) across different simulation conditions. Several simulation conditions were investigated, including two loss functions (linear and threshold loss function), three item pools (high discrimination, moderate discrimination and real item pool) and three test lengths (20, 40 and 60). Different loss parameters were manipulated in the Bayesian decision-theoretic procedure to examine the effectiveness of ...