Open Access. Powered by Scholars. Published by Universities.®

Social and Behavioral Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 43

Full-Text Articles in Social and Behavioral Sciences

A Novel Examination Of None-Of-The-Above As It Influences Examinee Item Responses, Kathryn N. Thompson May 2023

A Novel Examination Of None-Of-The-Above As It Influences Examinee Item Responses, Kathryn N. Thompson

Dissertations, 2020-current

It is imperative to collect validity evidence prior to interpreting and using test scores. During the process of collecting validity evidence, test developers should consider whether test scores are contaminated by sources of extraneous information. This is referred to as construct irrelevant variance, or the “degree to which test scores are affected by processes that are extraneous to the test’s intended purpose” (AERA et al., 2014, p. 12). One possible source of construct irrelevant variance is violating item-writing guidelines, such as to “avoid the use of none-of-the-above” in multiple-choice items (Rodriguez, 2016, p. 268).

Numerous studies have been conducted with …


Using Irtrees To Account For Response Style Effects Between Item Formats, Stephanie Leroy May 2023

Using Irtrees To Account For Response Style Effects Between Item Formats, Stephanie Leroy

Masters Theses, 2020-current

Response styles are consistent person-traits that are defined as the tendency to systematically select responses unrelated to the construct being measured (Paulhus, 1991). Response styles introduce construct-irrelevant variance that distorts observed scores on a measure and biases interpretation of the data. The current study looks at midpoint response style (MRS) and extreme response style (ERS). MRS is the tendency to select the midpoint of a rating scale, while ERS is the tendency to select the endpoints of a rating scale. Previous research sought to either account for response style effects or prevemt them – the current study does both. To …


Double Dosing: Investigating The Utility Of Multiple Priming Questions On Test-Taking Motivation, Mara Mcfadden May 2023

Double Dosing: Investigating The Utility Of Multiple Priming Questions On Test-Taking Motivation, Mara Mcfadden

Masters Theses, 2020-current

Priming examinees with questions about intended effort prior to testing has been shown to significantly increase examinee expended effort via self-reported effort and response-time effort. However, this question-behavior effect seems to wear off later in a testing session, specifically when a test is given second in the session. I examined whether administering a second “dose” of the question-behavior effect could combat the decrease in examinee effort later in a testing session. To evaluate whether “double dosing” could increase examinee effort later in a testing session, I randomly assigned examinees to one of three question conditions prior to completing two low-stakes …


Many-Facet Rasch Designs: How Should Raters Be Assigned To Examinees?, Christine E. Demars, Yelisey A. Shapovalov, John D. Hathcoat Apr 2023

Many-Facet Rasch Designs: How Should Raters Be Assigned To Examinees?, Christine E. Demars, Yelisey A. Shapovalov, John D. Hathcoat

Department of Graduate Psychology - Faculty Scholarship

In Facets models, raters should be connected, and there are multiple ways to connect raters. Keeping the number of ratings constant and two raters scoring each examinee, the standard error of both rater severity and examinee ability was higher when raters scored one examinee in common with many different raters than when they scored many examinees in common with two raters. However, the differences were small, especially for the standard error of examinee ability. Alternatively, when only a subset of examinees were scored by two or more raters, the smallest standard errors were achieved when all raters scored a common …


Rapid Response Behavior Before And During The Pandemic, Katarina E. Schaefer May 2022

Rapid Response Behavior Before And During The Pandemic, Katarina E. Schaefer

Masters Theses, 2020-current

Different levels of examinee motivation pose a validity threat to the interpretation of test scores. This problem is heightened in low-stakes, remote testing environments. Though some ways exist to gauge average motivation throughout testing, less ways exist to gauge motivation fluctuations throughout a single test. One of those ways is through response times. Specifically, rapid response behavior occurs when examinees quickly answer an item without reading or engaging with the item. At James Madison University (JMU), students participating in campus-wide Assessment Days typically experienced an in-person, proctored Assessment Day. However, that changed during the pandemic. During the pandemic, examinees participated …


Writing While Black: African American Vernacular English (Aave) And Perceived Writing Performance, Jaylin N. Nesbitt May 2022

Writing While Black: African American Vernacular English (Aave) And Perceived Writing Performance, Jaylin N. Nesbitt

Masters Theses, 2020-current

In the education system, there have historically been inequities that have severely disadvantaged Black students academically. One area in which these inequities surface is on writing assessments in the form of lower scores. I argue that because the U.S. education system is centered around Standard American English (SAE), it disadvantages those from different linguistic backgrounds, specifically Black students, as they are most likely to be speakers of African American Vernacular English (AAVE). Although there are theoretical justifications for this, past literature has not empirically tied inequities on writing assessments to Black students’ use of AAVE. The current study used Natural …


The Use Of Complex-Structure Items In Multistage Testing, Paulius Satkus May 2022

The Use Of Complex-Structure Items In Multistage Testing, Paulius Satkus

Dissertations, 2020-current

When developing tests, measurement experts may prefer simple-structure items because they measure one trait, which simplifies scoring and scoring interpretation. Conversely, complex-structure items may be preferred to reflect the complexity of multidimensional constructs. The current study sought to address the gap in the literature of multi-stage testing by conducting a simulation study with a hypothetical two-stage adaptive test with a purpose of comparing the performance of simple and complex structure items. The findings suggest that with a longer test (60 items), the two types of items performed similarly with respect to bias and RMSE of the trait estimates. For the …


Differential Motivation In Remote Educational Assessment: Person-Based Filtering Versus Response-Based Filtering, Sarah Alahmadi, Christine E. Demars Oct 2021

Differential Motivation In Remote Educational Assessment: Person-Based Filtering Versus Response-Based Filtering, Sarah Alahmadi, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

Large-scale educational assessments are often considered low-stakes, increasing the possibility of confounding true performance level with low motivation. These concerns are amplified in remote testing conditions. To remove the effects of low effort levels in responses observed in remote low-stakes testing, several motivation filtering methods can be used to purify the data. We estimated scores from assessment data collected remotely in Spring 2021 six ways, applying examinee-based filtering methods (filtering examinees based on total time) and response-based filtering methods (filtering responses using the effort-moderated IRT model), varying the thresholds selected to separate solution behavior (SB) responses from rapid-guessing behavior (RGB). …


Item Parameter Recovery With And Without The Use Of Priors, Paulius Satkus, Christine E. Demars Oct 2021

Item Parameter Recovery With And Without The Use Of Priors, Paulius Satkus, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

Marginal maximum likelihood (MML), a common estimation method for IRT models, is not inherently a Bayesian procedure. However, due to estimation difficulties, Bayesian priors are often applied to the likelihood when estimating 3PL models, especially with small samples. Little focus has been placed on choosing the priors for MML estimation. In this study, using samples sizes of 1000 or smaller, not using priors often led to extreme, implausible parameter estimates. Applying prior distributions to the c-parameters alleviated the estimation problems with samples of 1000; priors on both the a-parameters and c-parameters were needed for the samples of …


Investigating The Self In Self-Report, Samantha L. Boddy Aug 2021

Investigating The Self In Self-Report, Samantha L. Boddy

Masters Theses, 2020-current

Self-report items are ubiquitous in social sciences and services and medical centers. However, there is some concern about whether people are able to accurately report about themselves. One well-known source of concern is social desirability bias (SDB) or socially desirable responding (SDR), which involves people providing overly-positive responses about themselves that better align with social norms than might their actual attitudes or behaviors. However, several researchers (e.g., Brenner & DeLamater, 2016; Hadaway et al., 1998) suggest that a person’s identity in the area of interest may bias their responding. Specifically, that people interpret and respond to items in terms of …


Identifying Rater Effects For Writing And Critical Thinking: Applying The Many-Facets Rasch Model To The Value Institute, Yelisey A. Shapovalov May 2021

Identifying Rater Effects For Writing And Critical Thinking: Applying The Many-Facets Rasch Model To The Value Institute, Yelisey A. Shapovalov

Masters Theses, 2020-current

Performance assessments require examinees to carry out a process or produce a product and can be designed to have high fidelity to real-world application of higher-order skills. As such, performance assessments are highly valued in higher education settings. However, performance assessment is vulnerable to psychometric challenges that threaten the validity of scores due to the subjective nature of the scoring process. Specifically, raters must exercise judgement to provide scores to examinee work, which may be impacted by rater effects, or systematic differences in how raters evaluate performance assessment artifacts. Research has indicated that performance assessment may never be fully free …


Does Coding Method Matter? An Examination Of Propensity Score Methods When The Treatment Group Is Larger Than The Comparison Group, Beth A. Perkins May 2021

Does Coding Method Matter? An Examination Of Propensity Score Methods When The Treatment Group Is Larger Than The Comparison Group, Beth A. Perkins

Dissertations, 2020-current

In educational contexts, students often self-select into specific interventions (e.g., courses, majors, extracurricular programming). When students self-select into an intervention, systematic group differences may impact the validity of inferences made regarding the effect of the intervention. Propensity score methods are commonly used to reduce selection bias in estimates of treatment effects. In educational contexts, often a larger number of students receive a treatment than not. However, recommendations regarding the application of propensity score methods when the treatment group is larger than the comparison group have not been empirically examined. The current study examined the recommendation to recode the treatment and …


Understanding Motivations To Attend Various Sized Churches: A Study Using Family Communication Patterns, Expectancy Violations, And Anxiety To Predict Church Attendance, Molly Bradshaw May 2021

Understanding Motivations To Attend Various Sized Churches: A Study Using Family Communication Patterns, Expectancy Violations, And Anxiety To Predict Church Attendance, Molly Bradshaw

Masters Theses, 2020-current

Two separate studies were conducted to examine whether communication variables impact religious views and church attendance. For the first study, 228 students from a large Southeastern university completed a web survey. The second study was a web survey of 204 adults that was conducted via Amazon Mechanical Turk (MTURK). Both surveys were sent out to determine one’s motivations to attend a small, medium, or large church using family communication, anxiety, expectations, and religion variables as predictors. Family communication, anxiety, and expectancy variables were positively correlated to many aspects of religious views. Hierarchical regression models utilizing demographics, family communication, anxiety, expectancy …


Getting Caught-Up In The Process: Does It Really Matter?, Nikole Gregg May 2021

Getting Caught-Up In The Process: Does It Really Matter?, Nikole Gregg

Dissertations, 2020-current

Likert items are the most commonly used item-type for measuring attitudes and beliefs. However, responses from Likert items are often plagued with construct-irrelevant variance due to response style behavior. In other words, variability from Likert-item scores can be parsed into: 1) variance pertinent to the construct or trait of interest, and 2) variance irrelevant to the construct or trait of interest. Multidimensional Item Response Theory (MIRT) is an increasingly common modeling approach to parse out information regarding the response style traits and the trait of interest. These MIRT approaches are categorized into threshold-based approaches and response process approaches. An increasingly …


The Effects Of Undesirable Distractors On Estimates Of Ability, Kathryn N. Thompson May 2020

The Effects Of Undesirable Distractors On Estimates Of Ability, Kathryn N. Thompson

Masters Theses, 2020-current

Distractors, or the incorrect options, are an important part of the multiple-choice item. Previous literature has supported the inclusion of distractors when estimating abilities. While the effects of well-functioning distractors on estimates of ability have been examined, research has neglected to examine the effects of undesirable distractors on estimates of ability. Undesirable distractors are defined as distractors that are opposite of what test-developers expect or want distractors to behave. For instance, an upper lure distractor is one that high ability examinees select rather than selecting the correct answer. A simulation study was employed to determine these effects by varying undesirable …


Propensity Score Matching And Generalized Boosted Modeling In The Context Of Model Misspecification: A Simulation Study, Briana G. Craig May 2020

Propensity Score Matching And Generalized Boosted Modeling In The Context Of Model Misspecification: A Simulation Study, Briana G. Craig

Masters Theses, 2020-current

In the absence of random assignment, researchers must consider the impact of selection bias – pre-existing covariate differences between groups due to differences among those entering into treatment and those otherwise unable to participate. Propensity score matching (PSM) and generalized boosted modeling (GBM) are two quasi-experimental pre-processing methods that strive to reduce the impact of selection bias before analyzing a treatment effect. PSM and GBM both examine a treatment and comparison group and either match or weight members of those groups to create new, balanced groups. The new, balanced groups theoretically can then be used as a proxy for the …


Examining The Performance Of The Alignment Method In Dif Analyses, Paulius Satkus, Christine E. Demars Apr 2020

Examining The Performance Of The Alignment Method In Dif Analyses, Paulius Satkus, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

The alignment procedure is a new method for multiple group invariance models. An important advantage of alignment over the traditional methods is that alignment does not require full measurement invariance to estimate group means and variances (Muthén & Asparouhov, 2014). Simulation studies have supported that alignment performs adequately in situations when few items are noninvariant (or function differentially across groups – DIF). In most other studies, the tests were simulated to represent attitudinal surveys (e.g., fewer items, continuous data). In this study, we evaluated how alignment would perform with a typical educational cognitive test – 40 items scored dichotomously. Different …


Examining The Effects Of Specifying Bayesian Priors On The Wald's Test For Dif, Paulius Satkus, Christine E. Demars Oct 2019

Examining The Effects Of Specifying Bayesian Priors On The Wald's Test For Dif, Paulius Satkus, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

No abstract provided.


An Applied Example Of A Two-Tier Multiple-Group Testlet Model, Paulius Satkus, Christine E. Demars Oct 2019

An Applied Example Of A Two-Tier Multiple-Group Testlet Model, Paulius Satkus, Christine E. Demars

Department of Graduate Psychology - Faculty Scholarship

No abstract provided.


Are All Cognitive Items Equally Prone To Position Effects? Exploring The Relationships Among Item Features And Position Effects, Thai Quang Ong May 2019

Are All Cognitive Items Equally Prone To Position Effects? Exploring The Relationships Among Item Features And Position Effects, Thai Quang Ong

Dissertations, 2014-2019

One type of context effect is a position effect, which implies parameters of an item are influenced by the position of the item on the test. Researchers often discuss two types of position effects: negative position effects and positive position effects (e.g., Albano, 2013; Debeer & Janssen, 2013). Items exhibiting negative position effects become harder when placed later on the test, whereas items exhibiting positive position effects become easier when placed later on the test. Researchers have primarily examined the underlying causes of position effects through an item or person perspective (e.g., Bulut, 2015; Kingston & Dorans, 1984; Qian, 2014). …


Test Emotions, Value, And Self-Efficacy: A Longitudinal Model Predicting Examinee Effort And Performance On A Low-Stakes Test, Paulius Satkus May 2019

Test Emotions, Value, And Self-Efficacy: A Longitudinal Model Predicting Examinee Effort And Performance On A Low-Stakes Test, Paulius Satkus

Masters Theses, 2010-2019

The validity of scores from low-stakes tests may be compromised by examinee motivation. Expectancy-Value theory (EV) has been used to frame the antecedents of examinee motivation in low-stakes testing contexts. According to EV theory, the perceived value of the test and the expectancy to succeed on the test directly affect examinee effort, which then affects test performance. Cross-sectional research studies in low-stakes testing contexts offer some support of EV theory. Control-Value theory (CV) serves as another theory to understand motivation toward a task. CV theory encompasses the constructs of expectancy and value from EV theory, but incorporates test emotions as …


Considerations In S-Χ2: Rest Score Or Summed Score, Priors, And Violations Of Normality, Christine E. Demars, Derek Sauder Apr 2019

Considerations In S-Χ2: Rest Score Or Summed Score, Priors, And Violations Of Normality, Christine E. Demars, Derek Sauder

Department of Graduate Psychology - Faculty Scholarship

The S-χ2 item fit index is one of the few item fit indices that appears to maintain accurate Type I error rates. This study explored grouping examinees by the rest score or summed score, prior distributions for the item parameters, and the shape of the ability distribution. Type I error was slightly closer to the nominal level for the total-score S-χ2 for the longest tests, but power was higher for the rest-score S-χ2 in every condition where power was < 1. Prior distributions reduced the proportion of estimates with extreme standard errors but slightly inflated the Type I error rates in some conditions. When the ability distribution was not normally distributed, integrating over an empirically-estimated distribution yielded Type I error rates closer to the nominal value than integrating over a normal distribution.


The Psychology Of Performance In Elite Youth Soccer Players, Matthew Best Dec 2018

The Psychology Of Performance In Elite Youth Soccer Players, Matthew Best

Senior Honors Projects, 2010-2019

This study is a holistic assessment of psychological mindsets, which are one’s attitudes, beliefs, and perceptions, in elite youth male soccer players between the ages of 13 and 18 and the exploration of the relationships between these mindsets and performance outcomes. The mindsets that were assessed were expectancy, growth mindset, value, goals, belongingness, grit, and self-regulation, and the performance outcomes were minutes played, goals scored, and goals allowed. The mindsets were selected through a review of research in education and sport. I conducted Exploratory Factor Analyses (EFA) and Cronbach’s alpha coefficient analyses to assess the validity and reliability of the …


Beyond Motivation: Differences In Score Meaning Between Assessment Conditions, Nikole Gregg May 2018

Beyond Motivation: Differences In Score Meaning Between Assessment Conditions, Nikole Gregg

Masters Theses, 2010-2019

Written communication is a skill necessary for not only the success of undergraduate students, but for post-graduates in the workplace. Furthermore, according to employers the writing skills of post-graduates tend to be below expectations. Therefore, the assessment of such skills within higher education is in high demand. Written communication assessments tend to be administered in one of two conditions: 1) course embedded and 2) a low-stakes, non-embedded condition. The current study investigated possible construct-irrelevant variance in writing assessment scores by using data from a mid-sized public university in the Mid-Atlantic region of the United States. Specifically, 157 student products were …


Posterior Predictive Model Checking Of Local Misfit For Bayesian Confirmatory Factor Analysis, Chi Hang Au May 2018

Posterior Predictive Model Checking Of Local Misfit For Bayesian Confirmatory Factor Analysis, Chi Hang Au

Masters Theses, 2010-2019

Posterior predictive model checks (PPMC) are one Bayesian model-data fit approach. Thus far, PPMC for Confirmatory Factor Analytic applications focused primarily on global fit evaluation, ignoring the nuanced information in local misfit diagnostics. This study developed a PPMC approach for local misfit and applied it to a test-taking motivation scale. If the PPMC approach is effective, fit conclusions derived from the PPMC approach should be congruent with the fit conclusions derived from the Frequentist approach. Number of item-pairs flagged as misfitting and number of disagreements were computed to evaluate congruence. Congruence is achieved if the number of item-pairs flagged as …


The Influence Of Covariate Measurement Error On Treatment Effect Estimates And Numeric Balance Diagnostics Following Several Common Methods Of Propensity Score Matching: A Simulation Study, Heather D. Harris May 2018

The Influence Of Covariate Measurement Error On Treatment Effect Estimates And Numeric Balance Diagnostics Following Several Common Methods Of Propensity Score Matching: A Simulation Study, Heather D. Harris

Dissertations, 2014-2019

In applied intervention studies, researchers frequently aim to make inferences about the impact of a treatment program on participants. However, applied researchers are often faced with threats to the internal validity of their studies, or the extent to which changes in participants’ outcomes can be attributed to the intervention. When researchers are unable to randomly assign study participants to treatment conditions, changes in the intervention outcome might be confounded with systematic differences in participants’ baseline characteristics. Propensity score matching is one technique that allows researchers to account for threats to the internal validity of a study. Specifically, using propensity score …


In Search Of Equality: Developing An Equal Interval Likert Response Scale, Elisabeth M. Spratto May 2018

In Search Of Equality: Developing An Equal Interval Likert Response Scale, Elisabeth M. Spratto

Dissertations, 2014-2019

Attitude scales are an important component of educational and psychological research. One consideration when seeking to make valid inferences from attitudinal data is the issue of the degree to which response options can be assumed to have equal intervals. Many response options on attitudinal measures may produce ordinal-level data rather than interval. This poses a problem for the statistical tests that may be used, as many analyses assume interval-level data. It also poses an interpretational issue if the conceptual distance between response options is not the same – for example, if a researcher believes that someone who answered Agree differs …


Using Multiple Imputation To Mitigate The Effects Of Low Examinee Motivation On Estimates Of Student Learning, Kelly J. Foelber May 2017

Using Multiple Imputation To Mitigate The Effects Of Low Examinee Motivation On Estimates Of Student Learning, Kelly J. Foelber

Dissertations, 2014-2019

In higher education, we often collect data in order to make inferences about student learning, and ultimately, in order to make evidence-based changes to try to improve student learning. The validity of the inferences we make, however, depends on the quality of the data we collect. Low examinee motivation compromises these inferences; research suggests that low examinee motivation can lead to inaccurate estimates of examinees’ ability (e.g., Wise & DeMars, 2005). To obtain data that better represent what students know, think, and can do, practitioners must consider, and attempt to negate the effects of, low examinee motivation. The primary purpose …


You Only Live Up To The Standards You Set: An Evaluation Of Different Approaches To Standard Setting, Scott N. Strickman May 2017

You Only Live Up To The Standards You Set: An Evaluation Of Different Approaches To Standard Setting, Scott N. Strickman

Dissertations, 2014-2019

Interpretation of performance in reference to a standard can provide nuanced, finely-tuned information regarding examinee abilities beyond that of just a total score. However, there is a multitude of ways to set performance standards yet little guidance regarding which method operates best and under what circumstances. Traditional methods are the most common approach adopted in practice and heavily involve subject matter experts (SMEs). Two other approaches have been suggested in the literature as alternative ways to set performance standards, although they have yet to be implemented in practice. Data-driven approaches do not involve SMEs but rather rely solely upon statistical …


Retrospective Versus Prospective Measurement Of Examinee Motivation In Low-Stakes Testing Contexts: A Moderated Mediation Model, Aaron J. Myers May 2017

Retrospective Versus Prospective Measurement Of Examinee Motivation In Low-Stakes Testing Contexts: A Moderated Mediation Model, Aaron J. Myers

Masters Theses, 2010-2019

Expectancy-value theory applied to examinee motivation suggests examinees’ perceived value of a test indirectly affects test performance via examinee effort. This empirically supported indirect effect, however, is often modeled using importance and effort scores measured after test completion, which does not align with their theoretically specified temporal order. Retrospectively measured importance and effort scores may be influenced by examinees’ test performance, impacting the estimate of the indirect effect. To investigate the effect of timing of measurement, first-year college students were randomly assigned to one of three conditions where (1) importance and effort were measured retrospectively; (2) importance was measured prospectively; …