Entire DC Network | Open Access Articles | Digital Commons Network™

Examining The Effects Of Specifying Bayesian Priors On The Wald's Test For Dif, Paulius Satkus, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

No abstract provided.

Go to article

An Applied Example Of A Two-Tier Multiple-Group Testlet Model, Paulius Satkus, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

No abstract provided.

Go to article

Considerations In S-Χ2: Rest Score Or Summed Score, Priors, And Violations Of Normality, Christine E. Demars, Derek Sauder

Department of Graduate Psychology - Faculty Scholarship

The S-χ2 item fit index is one of the few item fit indices that appears to maintain accurate Type I error rates. This study explored grouping examinees by the rest score or summed score, prior distributions for the item parameters, and the shape of the ability distribution. Type I error was slightly closer to the nominal level for the total-score S-χ2 for the longest tests, but power was higher for the rest-score S-χ2 in every condition where power was < 1. Prior distributions reduced the proportion of estimates with extreme standard errors but slightly inflated the Type I error rates in some conditions. When the ability distribution was not normally distributed, integrating over an empirically-estimated distribution yielded Type I error rates closer to the nominal value than integrating over a normal distribution.

Go to article

Multilevel Irt: When Is Local Independence Violated?, Christine E. Demars, Jessica Jacovidis

Department of Graduate Psychology - Faculty Scholarship

Calibration data often is often collected within schools. This illustration shows that random school effects for ability do not bias IRT parameter estimates or their standard errors. However, random school effects for item difficulty lead to bias in item discrimination estimates and inflated standard errors for difficulty and ability.

Go to article

Modeling Dif With The Rasch Model: The Unfortunate Combination Of Mean Ability Differences And Guessing, Christine E. Demars, Daniel P. Jurich

Department of Graduate Psychology - Faculty Scholarship

Concerns with using the Rasch model to estimate DIF when there are large group differences in ability (impact) and the data follow a 3PL model are discussed. This demonstration showed that, with large group ability differences, difficult non-DIF items appeared to favor the focal group and, to a smaller degree, easy non-DIF items appeared to favor the reference group. Correspondingly, the effect sizes for DIF items were biased. With equal ability distributions for the reference and focal groups, DIF effect sizes were unbiased for non-DIF items; effect sizes were somewhat overestimated in absolute values for difficult items and somewhat underestimated …

Go to article

A Comparison Of Limited-Information And Full-Information Methods In Mplus For Estimating Irt Parameters For Non-Normal Populations, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

In structural equation modeling software, either limited-information (bivariate proportions) or full-information item parameter estimation routines could be used for the 2PL IRT model. Limited-information methods assume the continuous variable underlying an item response is normally distributed. For skewed and platykurtic latent variable distributions, three methods were compared in Mplus: limited-information, full-information integrating over a normal distribution, and full-information integrating over the known underlying distribution. For the most discriminating easy or difficult items, limited-information estimates of both parameters were considerably biased. Full-information estimates obtained by integrating over a normal distribution were somewhat biased. Full-information estimates obtained by integrating over the true …

Go to article

Individual Score Validity And Student Effort In Higher Education Assessment, Christine E. Demars, Steven L. Wise, Lisa F. Smith

Department of Graduate Psychology - Faculty Scholarship

This study explored the use of the five invalidity flags plus a new sixth flag based on self-reported effort. Participants were 155 entering first-year university students who were measured during an orientation week and again 18 months later. The instruments were a faculty-developed test of oral communications skills with 40 four-option multiple-choice items and a self-reported measure of test-taking motivation (Student Opinion Survey; Sundre, 1999 adapted from Wolf and Smith, 1995). Results indicated that the Flags explored in this study generalized well to university students. There was a moderate correlation between Response Time Effort and Effort as measured by the …

Go to article

Scoring Multiple Choice Items: A Comparison Of Irt And Classical Polytomous And Dichotomous Methods, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

Four methods of scoring multiple-choice items were compared: Dichotomous classical (number-correct), polytomous classical (classical optimal scaling – COS), dichotomous IRT (3 parameter logistic – 3PL), and polytomous IRT (nominal response – NR). Data were generated to follow either a nominal response model or a non-parametric model, based on empirical data. The polytomous models, which weighted the distractors differentially, yielded small increases in reliability compared to their dichotomous counterparts. The polytomous IRT estimates were less biased than the dichotomous IRT estimates for lower scores. The classical polytomous scores were as reliable, sometimes more reliable, than the IRT polytomous scores. This was …

Go to article

Neutral Or Unsure: Is There A Difference?, Christine E. Demars, T. Dary Erwin

Department of Graduate Psychology - Faculty Scholarship

University students responded to a survey measuring identity development using a 4-point Likert-type scale with two additional options: neutral and unsure. The level of identity development of students who chose neutral was compared to the level of identity development of students who chose unsure on the same item. On average, these two groups of students had similar scores. Neutral and unsure did not seem to be used to indicate different levels of the construct of interest. Often these two categories were used as a middle response, but on one scale they were used as a moderately high response

Go to article

Scoring Subscales Using Multidimensional Item Response Theory Model, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

Several methods for estimating item response theory scores for multiple subtests were compared. These methods included two multidimensional item response theory models: a bifactor model where each subtest was a composite score based on the primary trait measured by the set of tests and a secondary trait measured by the individual subtest, and a model where the traits measured by the subtests were separate but correlated. Composite scores based on unidimensional item response theory, with each subtest borrowing information from the other subtest, as well as independent unidimensional scores for each subtest were also considered. Correlations among scores from all …

Go to article

A Comparison Of The Recovery Of Parameters Using The Nominal Response And Generalized Partial Credit Models, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

In this simulation study, data were generated such that some items fit the generalized partial credit model (GPCM) while other items fit the nominal response model (NRM) but not the constraints of the GPCM. The purpose was to explore (a) how the errors in parameter estimation were affected by using the GPCM when the constraints of the GPCM were inappropriate, and (b) how the errors were affected by using the less-constrained NRM when the constraints of the GPCM were appropriate. With large sample sizes, there were considerable gains in precision from using the NRM when the GPCM was inappropriate, and …

Go to article

Item Parameter Drift: The Impact Of The Curricular Area, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

The items from tests from two content areas, information literacy and global issues, were examined for item parameter drift across four years. The items on the information literacy test were expected to show more drift because the content of this field is changing more rapidly and because the test changed from low to high stakes for students while the other test remained low stakes. More items did show drift on the information literacy test, but the drift was not always readily explained. Further, some items did not fit the drift model available in BILOG-MG, either because the drift was a …

Go to article

Recovery Of Graded Response And Partial Credit Parameters In Multilog And Parscale, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

Using simulated data, MUL TILOG and P ARSCALE were compared on their recovery of item and trait parameters under the graded response and generalized partial credit item response theory models. The shape of the latent population distribution (normal, skewed, or uniform) and the sample size (250 or 500) were varied. Parameter estimates were essentially unbiased under all conditions, and the root mean square error was similar for both software packages. The choice between these packages can therefore be based on considerations other than the accuracy of parameter estimation.

Go to article

Missing Data And Irt Item Parameter Estimation, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

Non-randomly missing data has theoretically different implications for item parameter estimation depending on whether joint maximum likelihood or marginal maximum likelihood methods are used in the estimation. The objective of this paper is to illustrate what potentially can happen, under these estimation procedures, when there is an association between ability and the absence of response. In this example, data is missing because some students, particularly low-ability students, did not complete the test.

Go to article

Modeling Student Outcomes In A General Education Course With Hierarchical Linear Models, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

When students are nested within course sections, the assumption of independence of residuals is unlikely to be met, unless the course section is explicitly included in the model. Hierarchical linear modeling (HLM) allows for modeling the course section as a random effect, leading to more accurate standard errors. In this study, students chose one of four themes for a communications course, with multiple sections and instructors within each theme. HLM was used to test for differences by theme in scores on a final exam; the differences were not significant when SAT scores were controlled.

Go to article

Equating Multiple Forms Of A Competency Test: An Item Response Theory Approach, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

A competency test was developed to assess students' skills in using electronic library resources. Because all students were required to pass the test, and had multiple opportunities to do so, multiple test forms were desired. Standards had been set on the original form, and minor differences in form difficulty needed to be taken into account. Students were randomly administered one of six new test forms; each form contained the original items and 12 pilot items which were different on each form. The pilot items were then calibrated to the metric of the original items and incorporated in two additional operational …

Go to article

Does The Relationship Between Motivation And Performance Differ With Ability?, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

In this study of college students taking a science test or a social science test under non-consequential conditions, performance was positively correlated with self-reported motivation. The association, though, was smaller for students of lower ability (as measured by the SAT).

Go to article

Item Estimates Under Low-Stakes Conditions: How Should Omits Be Treated?, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

Using data from a pilot test of science and math, item difficulties were estimated with a one-parameter model (partial-credit model for the multi-point items). Some items were multiple-choice items, and others were constructed-response items (open-ended). Four sets of estimates were obtained: estimates for males and females, and treating omitted items as incorrect and treating omitted items as not-presented (not-reached). Then, using data from an operational test (high-stakes, for diploma endorsement), the fit of these item estimates was assessed. In science, the fit was quite good under all conditions. In math, the fit was better for girls than for boys, the …

Go to article

Digital Commons Network^™

Full-Text Articles in Entire DC Network

Examining The Effects Of Specifying Bayesian Priors On The Wald's Test For Dif, Paulius Satkus, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

An Applied Example Of A Two-Tier Multiple-Group Testlet Model, Paulius Satkus, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

Considerations In S-Χ2: Rest Score Or Summed Score, Priors, And Violations Of Normality, Christine E. Demars, Derek Sauder

Department of Graduate Psychology - Faculty Scholarship

Multilevel Irt: When Is Local Independence Violated?, Christine E. Demars, Jessica Jacovidis

Department of Graduate Psychology - Faculty Scholarship

Modeling Dif With The Rasch Model: The Unfortunate Combination Of Mean Ability Differences And Guessing, Christine E. Demars, Daniel P. Jurich

Department of Graduate Psychology - Faculty Scholarship

A Comparison Of Limited-Information And Full-Information Methods In Mplus For Estimating Irt Parameters For Non-Normal Populations, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

Individual Score Validity And Student Effort In Higher Education Assessment, Christine E. Demars, Steven L. Wise, Lisa F. Smith

Department of Graduate Psychology - Faculty Scholarship

Scoring Multiple Choice Items: A Comparison Of Irt And Classical Polytomous And Dichotomous Methods, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

Neutral Or Unsure: Is There A Difference?, Christine E. Demars, T. Dary Erwin

Department of Graduate Psychology - Faculty Scholarship

Scoring Subscales Using Multidimensional Item Response Theory Model, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

A Comparison Of The Recovery Of Parameters Using The Nominal Response And Generalized Partial Credit Models, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

Item Parameter Drift: The Impact Of The Curricular Area, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

Recovery Of Graded Response And Partial Credit Parameters In Multilog And Parscale, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

Missing Data And Irt Item Parameter Estimation, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

Modeling Student Outcomes In A General Education Course With Hierarchical Linear Models, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

Equating Multiple Forms Of A Competency Test: An Item Response Theory Approach, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

Does The Relationship Between Motivation And Performance Differ With Ability?, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

Item Estimates Under Low-Stakes Conditions: How Should Omits Be Treated?, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship