Open Access. Powered by Scholars. Published by Universities.®

Social and Behavioral Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 35

Full-Text Articles in Social and Behavioral Sciences

A Novel Examination Of None-Of-The-Above As It Influences Examinee Item Responses, Kathryn N. Thompson May 2023

A Novel Examination Of None-Of-The-Above As It Influences Examinee Item Responses, Kathryn N. Thompson

Dissertations, 2020-current

It is imperative to collect validity evidence prior to interpreting and using test scores. During the process of collecting validity evidence, test developers should consider whether test scores are contaminated by sources of extraneous information. This is referred to as construct irrelevant variance, or the “degree to which test scores are affected by processes that are extraneous to the test’s intended purpose” (AERA et al., 2014, p. 12). One possible source of construct irrelevant variance is violating item-writing guidelines, such as to “avoid the use of none-of-the-above” in multiple-choice items (Rodriguez, 2016, p. 268).

Numerous studies have been conducted with …


Using Irtrees To Account For Response Style Effects Between Item Formats, Stephanie Leroy May 2023

Using Irtrees To Account For Response Style Effects Between Item Formats, Stephanie Leroy

Masters Theses, 2020-current

Response styles are consistent person-traits that are defined as the tendency to systematically select responses unrelated to the construct being measured (Paulhus, 1991). Response styles introduce construct-irrelevant variance that distorts observed scores on a measure and biases interpretation of the data. The current study looks at midpoint response style (MRS) and extreme response style (ERS). MRS is the tendency to select the midpoint of a rating scale, while ERS is the tendency to select the endpoints of a rating scale. Previous research sought to either account for response style effects or prevemt them – the current study does both. To …


Double Dosing: Investigating The Utility Of Multiple Priming Questions On Test-Taking Motivation, Mara Mcfadden May 2023

Double Dosing: Investigating The Utility Of Multiple Priming Questions On Test-Taking Motivation, Mara Mcfadden

Masters Theses, 2020-current

Priming examinees with questions about intended effort prior to testing has been shown to significantly increase examinee expended effort via self-reported effort and response-time effort. However, this question-behavior effect seems to wear off later in a testing session, specifically when a test is given second in the session. I examined whether administering a second “dose” of the question-behavior effect could combat the decrease in examinee effort later in a testing session. To evaluate whether “double dosing” could increase examinee effort later in a testing session, I randomly assigned examinees to one of three question conditions prior to completing two low-stakes …


Rapid Response Behavior Before And During The Pandemic, Katarina E. Schaefer May 2022

Rapid Response Behavior Before And During The Pandemic, Katarina E. Schaefer

Masters Theses, 2020-current

Different levels of examinee motivation pose a validity threat to the interpretation of test scores. This problem is heightened in low-stakes, remote testing environments. Though some ways exist to gauge average motivation throughout testing, less ways exist to gauge motivation fluctuations throughout a single test. One of those ways is through response times. Specifically, rapid response behavior occurs when examinees quickly answer an item without reading or engaging with the item. At James Madison University (JMU), students participating in campus-wide Assessment Days typically experienced an in-person, proctored Assessment Day. However, that changed during the pandemic. During the pandemic, examinees participated …


Writing While Black: African American Vernacular English (Aave) And Perceived Writing Performance, Jaylin N. Nesbitt May 2022

Writing While Black: African American Vernacular English (Aave) And Perceived Writing Performance, Jaylin N. Nesbitt

Masters Theses, 2020-current

In the education system, there have historically been inequities that have severely disadvantaged Black students academically. One area in which these inequities surface is on writing assessments in the form of lower scores. I argue that because the U.S. education system is centered around Standard American English (SAE), it disadvantages those from different linguistic backgrounds, specifically Black students, as they are most likely to be speakers of African American Vernacular English (AAVE). Although there are theoretical justifications for this, past literature has not empirically tied inequities on writing assessments to Black students’ use of AAVE. The current study used Natural …


The Use Of Complex-Structure Items In Multistage Testing, Paulius Satkus May 2022

The Use Of Complex-Structure Items In Multistage Testing, Paulius Satkus

Dissertations, 2020-current

When developing tests, measurement experts may prefer simple-structure items because they measure one trait, which simplifies scoring and scoring interpretation. Conversely, complex-structure items may be preferred to reflect the complexity of multidimensional constructs. The current study sought to address the gap in the literature of multi-stage testing by conducting a simulation study with a hypothetical two-stage adaptive test with a purpose of comparing the performance of simple and complex structure items. The findings suggest that with a longer test (60 items), the two types of items performed similarly with respect to bias and RMSE of the trait estimates. For the …


Investigating The Self In Self-Report, Samantha L. Boddy Aug 2021

Investigating The Self In Self-Report, Samantha L. Boddy

Masters Theses, 2020-current

Self-report items are ubiquitous in social sciences and services and medical centers. However, there is some concern about whether people are able to accurately report about themselves. One well-known source of concern is social desirability bias (SDB) or socially desirable responding (SDR), which involves people providing overly-positive responses about themselves that better align with social norms than might their actual attitudes or behaviors. However, several researchers (e.g., Brenner & DeLamater, 2016; Hadaway et al., 1998) suggest that a person’s identity in the area of interest may bias their responding. Specifically, that people interpret and respond to items in terms of …


Identifying Rater Effects For Writing And Critical Thinking: Applying The Many-Facets Rasch Model To The Value Institute, Yelisey A. Shapovalov May 2021

Identifying Rater Effects For Writing And Critical Thinking: Applying The Many-Facets Rasch Model To The Value Institute, Yelisey A. Shapovalov

Masters Theses, 2020-current

Performance assessments require examinees to carry out a process or produce a product and can be designed to have high fidelity to real-world application of higher-order skills. As such, performance assessments are highly valued in higher education settings. However, performance assessment is vulnerable to psychometric challenges that threaten the validity of scores due to the subjective nature of the scoring process. Specifically, raters must exercise judgement to provide scores to examinee work, which may be impacted by rater effects, or systematic differences in how raters evaluate performance assessment artifacts. Research has indicated that performance assessment may never be fully free …


Does Coding Method Matter? An Examination Of Propensity Score Methods When The Treatment Group Is Larger Than The Comparison Group, Beth A. Perkins May 2021

Does Coding Method Matter? An Examination Of Propensity Score Methods When The Treatment Group Is Larger Than The Comparison Group, Beth A. Perkins

Dissertations, 2020-current

In educational contexts, students often self-select into specific interventions (e.g., courses, majors, extracurricular programming). When students self-select into an intervention, systematic group differences may impact the validity of inferences made regarding the effect of the intervention. Propensity score methods are commonly used to reduce selection bias in estimates of treatment effects. In educational contexts, often a larger number of students receive a treatment than not. However, recommendations regarding the application of propensity score methods when the treatment group is larger than the comparison group have not been empirically examined. The current study examined the recommendation to recode the treatment and …


Understanding Motivations To Attend Various Sized Churches: A Study Using Family Communication Patterns, Expectancy Violations, And Anxiety To Predict Church Attendance, Molly Bradshaw May 2021

Understanding Motivations To Attend Various Sized Churches: A Study Using Family Communication Patterns, Expectancy Violations, And Anxiety To Predict Church Attendance, Molly Bradshaw

Masters Theses, 2020-current

Two separate studies were conducted to examine whether communication variables impact religious views and church attendance. For the first study, 228 students from a large Southeastern university completed a web survey. The second study was a web survey of 204 adults that was conducted via Amazon Mechanical Turk (MTURK). Both surveys were sent out to determine one’s motivations to attend a small, medium, or large church using family communication, anxiety, expectations, and religion variables as predictors. Family communication, anxiety, and expectancy variables were positively correlated to many aspects of religious views. Hierarchical regression models utilizing demographics, family communication, anxiety, expectancy …


Getting Caught-Up In The Process: Does It Really Matter?, Nikole Gregg May 2021

Getting Caught-Up In The Process: Does It Really Matter?, Nikole Gregg

Dissertations, 2020-current

Likert items are the most commonly used item-type for measuring attitudes and beliefs. However, responses from Likert items are often plagued with construct-irrelevant variance due to response style behavior. In other words, variability from Likert-item scores can be parsed into: 1) variance pertinent to the construct or trait of interest, and 2) variance irrelevant to the construct or trait of interest. Multidimensional Item Response Theory (MIRT) is an increasingly common modeling approach to parse out information regarding the response style traits and the trait of interest. These MIRT approaches are categorized into threshold-based approaches and response process approaches. An increasingly …


The Effects Of Undesirable Distractors On Estimates Of Ability, Kathryn N. Thompson May 2020

The Effects Of Undesirable Distractors On Estimates Of Ability, Kathryn N. Thompson

Masters Theses, 2020-current

Distractors, or the incorrect options, are an important part of the multiple-choice item. Previous literature has supported the inclusion of distractors when estimating abilities. While the effects of well-functioning distractors on estimates of ability have been examined, research has neglected to examine the effects of undesirable distractors on estimates of ability. Undesirable distractors are defined as distractors that are opposite of what test-developers expect or want distractors to behave. For instance, an upper lure distractor is one that high ability examinees select rather than selecting the correct answer. A simulation study was employed to determine these effects by varying undesirable …


Propensity Score Matching And Generalized Boosted Modeling In The Context Of Model Misspecification: A Simulation Study, Briana G. Craig May 2020

Propensity Score Matching And Generalized Boosted Modeling In The Context Of Model Misspecification: A Simulation Study, Briana G. Craig

Masters Theses, 2020-current

In the absence of random assignment, researchers must consider the impact of selection bias – pre-existing covariate differences between groups due to differences among those entering into treatment and those otherwise unable to participate. Propensity score matching (PSM) and generalized boosted modeling (GBM) are two quasi-experimental pre-processing methods that strive to reduce the impact of selection bias before analyzing a treatment effect. PSM and GBM both examine a treatment and comparison group and either match or weight members of those groups to create new, balanced groups. The new, balanced groups theoretically can then be used as a proxy for the …


Are All Cognitive Items Equally Prone To Position Effects? Exploring The Relationships Among Item Features And Position Effects, Thai Quang Ong May 2019

Are All Cognitive Items Equally Prone To Position Effects? Exploring The Relationships Among Item Features And Position Effects, Thai Quang Ong

Dissertations, 2014-2019

One type of context effect is a position effect, which implies parameters of an item are influenced by the position of the item on the test. Researchers often discuss two types of position effects: negative position effects and positive position effects (e.g., Albano, 2013; Debeer & Janssen, 2013). Items exhibiting negative position effects become harder when placed later on the test, whereas items exhibiting positive position effects become easier when placed later on the test. Researchers have primarily examined the underlying causes of position effects through an item or person perspective (e.g., Bulut, 2015; Kingston & Dorans, 1984; Qian, 2014). …


Test Emotions, Value, And Self-Efficacy: A Longitudinal Model Predicting Examinee Effort And Performance On A Low-Stakes Test, Paulius Satkus May 2019

Test Emotions, Value, And Self-Efficacy: A Longitudinal Model Predicting Examinee Effort And Performance On A Low-Stakes Test, Paulius Satkus

Masters Theses, 2010-2019

The validity of scores from low-stakes tests may be compromised by examinee motivation. Expectancy-Value theory (EV) has been used to frame the antecedents of examinee motivation in low-stakes testing contexts. According to EV theory, the perceived value of the test and the expectancy to succeed on the test directly affect examinee effort, which then affects test performance. Cross-sectional research studies in low-stakes testing contexts offer some support of EV theory. Control-Value theory (CV) serves as another theory to understand motivation toward a task. CV theory encompasses the constructs of expectancy and value from EV theory, but incorporates test emotions as …


The Psychology Of Performance In Elite Youth Soccer Players, Matthew Best Dec 2018

The Psychology Of Performance In Elite Youth Soccer Players, Matthew Best

Senior Honors Projects, 2010-2019

This study is a holistic assessment of psychological mindsets, which are one’s attitudes, beliefs, and perceptions, in elite youth male soccer players between the ages of 13 and 18 and the exploration of the relationships between these mindsets and performance outcomes. The mindsets that were assessed were expectancy, growth mindset, value, goals, belongingness, grit, and self-regulation, and the performance outcomes were minutes played, goals scored, and goals allowed. The mindsets were selected through a review of research in education and sport. I conducted Exploratory Factor Analyses (EFA) and Cronbach’s alpha coefficient analyses to assess the validity and reliability of the …


Beyond Motivation: Differences In Score Meaning Between Assessment Conditions, Nikole Gregg May 2018

Beyond Motivation: Differences In Score Meaning Between Assessment Conditions, Nikole Gregg

Masters Theses, 2010-2019

Written communication is a skill necessary for not only the success of undergraduate students, but for post-graduates in the workplace. Furthermore, according to employers the writing skills of post-graduates tend to be below expectations. Therefore, the assessment of such skills within higher education is in high demand. Written communication assessments tend to be administered in one of two conditions: 1) course embedded and 2) a low-stakes, non-embedded condition. The current study investigated possible construct-irrelevant variance in writing assessment scores by using data from a mid-sized public university in the Mid-Atlantic region of the United States. Specifically, 157 student products were …


Posterior Predictive Model Checking Of Local Misfit For Bayesian Confirmatory Factor Analysis, Chi Hang Au May 2018

Posterior Predictive Model Checking Of Local Misfit For Bayesian Confirmatory Factor Analysis, Chi Hang Au

Masters Theses, 2010-2019

Posterior predictive model checks (PPMC) are one Bayesian model-data fit approach. Thus far, PPMC for Confirmatory Factor Analytic applications focused primarily on global fit evaluation, ignoring the nuanced information in local misfit diagnostics. This study developed a PPMC approach for local misfit and applied it to a test-taking motivation scale. If the PPMC approach is effective, fit conclusions derived from the PPMC approach should be congruent with the fit conclusions derived from the Frequentist approach. Number of item-pairs flagged as misfitting and number of disagreements were computed to evaluate congruence. Congruence is achieved if the number of item-pairs flagged as …


The Influence Of Covariate Measurement Error On Treatment Effect Estimates And Numeric Balance Diagnostics Following Several Common Methods Of Propensity Score Matching: A Simulation Study, Heather D. Harris May 2018

The Influence Of Covariate Measurement Error On Treatment Effect Estimates And Numeric Balance Diagnostics Following Several Common Methods Of Propensity Score Matching: A Simulation Study, Heather D. Harris

Dissertations, 2014-2019

In applied intervention studies, researchers frequently aim to make inferences about the impact of a treatment program on participants. However, applied researchers are often faced with threats to the internal validity of their studies, or the extent to which changes in participants’ outcomes can be attributed to the intervention. When researchers are unable to randomly assign study participants to treatment conditions, changes in the intervention outcome might be confounded with systematic differences in participants’ baseline characteristics. Propensity score matching is one technique that allows researchers to account for threats to the internal validity of a study. Specifically, using propensity score …


In Search Of Equality: Developing An Equal Interval Likert Response Scale, Elisabeth M. Spratto May 2018

In Search Of Equality: Developing An Equal Interval Likert Response Scale, Elisabeth M. Spratto

Dissertations, 2014-2019

Attitude scales are an important component of educational and psychological research. One consideration when seeking to make valid inferences from attitudinal data is the issue of the degree to which response options can be assumed to have equal intervals. Many response options on attitudinal measures may produce ordinal-level data rather than interval. This poses a problem for the statistical tests that may be used, as many analyses assume interval-level data. It also poses an interpretational issue if the conceptual distance between response options is not the same – for example, if a researcher believes that someone who answered Agree differs …


Using Multiple Imputation To Mitigate The Effects Of Low Examinee Motivation On Estimates Of Student Learning, Kelly J. Foelber May 2017

Using Multiple Imputation To Mitigate The Effects Of Low Examinee Motivation On Estimates Of Student Learning, Kelly J. Foelber

Dissertations, 2014-2019

In higher education, we often collect data in order to make inferences about student learning, and ultimately, in order to make evidence-based changes to try to improve student learning. The validity of the inferences we make, however, depends on the quality of the data we collect. Low examinee motivation compromises these inferences; research suggests that low examinee motivation can lead to inaccurate estimates of examinees’ ability (e.g., Wise & DeMars, 2005). To obtain data that better represent what students know, think, and can do, practitioners must consider, and attempt to negate the effects of, low examinee motivation. The primary purpose …


You Only Live Up To The Standards You Set: An Evaluation Of Different Approaches To Standard Setting, Scott N. Strickman May 2017

You Only Live Up To The Standards You Set: An Evaluation Of Different Approaches To Standard Setting, Scott N. Strickman

Dissertations, 2014-2019

Interpretation of performance in reference to a standard can provide nuanced, finely-tuned information regarding examinee abilities beyond that of just a total score. However, there is a multitude of ways to set performance standards yet little guidance regarding which method operates best and under what circumstances. Traditional methods are the most common approach adopted in practice and heavily involve subject matter experts (SMEs). Two other approaches have been suggested in the literature as alternative ways to set performance standards, although they have yet to be implemented in practice. Data-driven approaches do not involve SMEs but rather rely solely upon statistical …


Retrospective Versus Prospective Measurement Of Examinee Motivation In Low-Stakes Testing Contexts: A Moderated Mediation Model, Aaron J. Myers May 2017

Retrospective Versus Prospective Measurement Of Examinee Motivation In Low-Stakes Testing Contexts: A Moderated Mediation Model, Aaron J. Myers

Masters Theses, 2010-2019

Expectancy-value theory applied to examinee motivation suggests examinees’ perceived value of a test indirectly affects test performance via examinee effort. This empirically supported indirect effect, however, is often modeled using importance and effort scores measured after test completion, which does not align with their theoretically specified temporal order. Retrospectively measured importance and effort scores may be influenced by examinees’ test performance, impacting the estimate of the indirect effect. To investigate the effect of timing of measurement, first-year college students were randomly assigned to one of three conditions where (1) importance and effort were measured retrospectively; (2) importance was measured prospectively; …


Student Learning Gains In Higher Education: A Longitudinal Analysis With Faculty Discussion, Catherine E. Mathers May 2017

Student Learning Gains In Higher Education: A Longitudinal Analysis With Faculty Discussion, Catherine E. Mathers

Masters Theses, 2010-2019

Student learning is the primary desired outcome of a college education. To understand how educational programming and curricula affect students, colleges and universities must collect evidence of student learning gain. In this study, a longitudinal design was employed to investigate how a math and science general education curriculum impacted college students’ quantitative and scientific reasoning. Quantitative and scientific reasoning gain scores were computed and predicted from personal (i.e., prior knowledge, gender) and curriculum (i.e., number of completed courses in the domain) characteristics to uncover what factors relate to learning gain. Collapsing across personal and curriculum variables, gain scores were moderate …


Examining The Type I Error And Power Of 18 Common Post-Hoc Comparison Tests, Derek Sauder May 2017

Examining The Type I Error And Power Of 18 Common Post-Hoc Comparison Tests, Derek Sauder

Masters Theses, 2010-2019

Researchers utilizing either experimental or quasi-experimental research often want to compare group means. However, with more than two groups, comparing group means may result in an inflated Type I error rate, the probability of wrongly rejecting a null hypothesis. Researchers often employ analysis of variance (ANOVA) methodology to compare more than two group means. Post-hoc comparison procedures (PCPs) are utilized to indicate which group means differ following a significant ANOVA. SPSS provides 18 options for PCPs. The purpose of this study was to determine which PCP provides the best power while maintaining Type I error control when assumptions of ANOVA …


Applying Solution Behavior Thresholds To A Noncognitive Measure To Identify Rapid Responders: An Empirical Investigation, Mary M. Johnston May 2016

Applying Solution Behavior Thresholds To A Noncognitive Measure To Identify Rapid Responders: An Empirical Investigation, Mary M. Johnston

Dissertations, 2014-2019

Noncognitive measures are increasingly being used for accountability purposes in higher education (e.g., O. L. Liu, Frankel, & Roohr, 2014). Because these measures are often collected under low-stakes conditions, there is a concern students do not put forth their best effort when responding, which is problematic given previous research has found noneffortful responding can negatively impact the validity of results (e.g., Barry & Finney, 2009; Meade & Craig, 2012; Swerdzewski, Harmes, & Finney, 2011). Subsequently, there is a need to identify students displaying low effort on low-stakes noncognitive measures. One method, which is based on response time and can discreetly …


Examining Latent Change Classes: An Application Of Factor Mixture Modeling To Change Scores, Thai Q. Ong May 2016

Examining Latent Change Classes: An Application Of Factor Mixture Modeling To Change Scores, Thai Q. Ong

Masters Theses, 2010-2019

Although change scores are used in a variety of statistical methods (e.g., analysis of variance and regression), there is a lack of application of latent variable modeling methods to change scores. This thesis provides a detailed description of two latent variable modeling methods applied to change scores: factor analysis of change scores and change score factor mixture modeling. To illustrate advantages of these methods, both were applied to change score data from undergraduates. Students responded to sense of identity items during a university-wide assessment day on two occasions, once as incoming freshmen and again as second-semester sophomores. Change scores were …


The Effect Of Anchoring Vignettes On Factor Structures: Student Effort As An Example, Carolyn A. Miesen May 2016

The Effect Of Anchoring Vignettes On Factor Structures: Student Effort As An Example, Carolyn A. Miesen

Masters Theses, 2010-2019

Anchoring vignettes are used as a methodological technique for removing differential interpretation of response categories (DIRC) from scores on subjective self-report measures (King, Murray, Slomon, & Tandon, 2004). This technique requires participants to read one or more short scenarios, or vignettes, designed to represent various levels of a construct. Vignette ratings are used as an indication of DIRC, which is a source of differential item functioning (DIF). Prior research primarily used indirect methods for evaluating vignette quality. In response, the present set of studies proposes using invariance testing as a more direct evaluation of how the use of anchoring vignettes …


The Effects Of A Planned Missingness Design On Examinee Motivation And Psychometric Quality, Matthew S. Swain May 2015

The Effects Of A Planned Missingness Design On Examinee Motivation And Psychometric Quality, Matthew S. Swain

Dissertations, 2014-2019

Assessment practitioners in higher education face increasing demands to collect assessment and accountability data to make important inferences about student learning and institutional quality. The validity of these high-stakes decisions is jeopardized, particularly in low-stakes testing contexts, when examinees do not expend sufficient motivation to perform well on the test. This study introduced planned missingness as a potential solution. In planned missingness designs, data on all items are collected but each examinee only completes a subset of items, thus increasing data collection efficiency, reducing examinee burden, and potentially increasing data quality. The current scientific reasoning test served as the Long …


Addressing Serial-Order And Negative-Keying Effects: A Mixed-Methods Study, Jerusha J. Gerstner May 2015

Addressing Serial-Order And Negative-Keying Effects: A Mixed-Methods Study, Jerusha J. Gerstner

Dissertations, 2014-2019

Researchers have studied item serial-order effects on attitudinal instruments by considering how item-total correlations differ based on the item’s placement within a scale (e.g., Hamilton & Shuminsky, 1990). In addition, other researchers have focused on item negative-keying effects on attitudinal instruments (e.g., Marsh, 1996). Researchers consistently have found that negatively-keyed items relate to one another above and beyond their relationship to the construct intended to be measured. However, only one study (i.e., Bandalos & Coleman, 2012) investigated the combined effects of serial-order and negative-keying on attitudinal instruments. Their brief study found some improvements in fit when attitudinal items were presented …