Open Access. Powered by Scholars. Published by Universities.®

Education Commons

Open Access. Powered by Scholars. Published by Universities.®

Item Response Theory

Discipline
Institution
Publication Year
Publication
Publication Type

Articles 1 - 23 of 23

Full-Text Articles in Education

Assessing Differential Item Functioning And Differential Test Functioning In An Academic Motivation Scale Using Item Response Theory Methods, Gerald J. Bean Dec 2022

Assessing Differential Item Functioning And Differential Test Functioning In An Academic Motivation Scale Using Item Response Theory Methods, Gerald J. Bean

International Journal of School Social Work

Social work researchers and practitioners who use measurement instruments to make data-informed decisions need to ensure those decisions are based on items and scales that are free from possible bias or undesirable differential functioning. In this study, we provide an example of how a set of Item Response Theory (IRT) statistical methods and tools can be used by social work measurement researchers to assess differential item (DIF) and scale (DTF) functioning. For the example, we explored the possible race, gender, and family composition differential functioning of a scale—the Academic Motivation Scale (AMS)—developed for use by school social workers. The data …


Investigation Into The Measurement Of Student Ability: Refinements, Improvements And Enhanced Understanding Of Factors That Influence Performance, David Schreurs May 2022

Investigation Into The Measurement Of Student Ability: Refinements, Improvements And Enhanced Understanding Of Factors That Influence Performance, David Schreurs

Theses and Dissertations

The following series of experiments were all designed and implemented with the goal of understanding and improving undergraduate students understanding of chemistry. In order to accurately access changes in student’s chemistry understanding, the first chapter focuses on how a student’s chemistry proficiency can be most effectively measured. This includes practical recommendations for the average instructor as well as advanced methods more appropriate for researchers who require highly precise scores. With a better understanding on how to measure student proficiency, the next step was working to improve student understanding through various types of feedback following assessments. On top of establishing a …


A Feasibility Study For Developing A Computerized Adaptive Form Of Raven’S Colored Progressive Matrices Test For Omani Children Based On The Item Response Theory, Ehab Mohammed Omara, علي مهدي كاظم Oct 2020

A Feasibility Study For Developing A Computerized Adaptive Form Of Raven’S Colored Progressive Matrices Test For Omani Children Based On The Item Response Theory, Ehab Mohammed Omara, علي مهدي كاظم

International Journal for Research in Education

The current study investigated the feasibility of developing a computerized adaptive form of Raven’s Colored Progressive Matrices test, one of the most important culture-free intelligent tests, using the Item Response Theory. The test consists of 36 items divided into three groups. The data used in the current study were adapted from the study by Kadhim et al. (2008), which included 1042 subjects, aged 5 to 10 from both genders and distributed into 11 Omani governorates. Item Response Theory assumptions were met and then the mirtCAT package was used to evaluate a computerized adaptive form of the test. Raven’s test items …


Estimation Of College Students' Ability On Real Analysis Course Using Rasch Model, Isnani Isnani, Wikan Budi Utami, Purwo Susongko, Herani Tri Lestiani Dec 2019

Estimation Of College Students' Ability On Real Analysis Course Using Rasch Model, Isnani Isnani, Wikan Budi Utami, Purwo Susongko, Herani Tri Lestiani

REID (Research and Evaluation in Education)

This study is aimed at estimating the difficulty level of essay tests and the accuracy of students' ability in Real Analysis essay test using the Rasch model with the QUEST program and R 3.0.3 package eRm program. The population in this study was all students of the Department of Mathematics Education, Universitas Pancasakti Tegal in the academic year 2016/2017, who were enrolled in the Real Analysis course. The data were analyzed using the R 3.0.3 package eRm program and QUEST program. The students' ability was obtained from the result of the course final exam of the first Real Analysis course. …


Examining Validity And Reliability Of A Mathematics Assessment Tool For K-2 Students, Carl F. Siebert, Jonathan L. Brendefur Apr 2018

Examining Validity And Reliability Of A Mathematics Assessment Tool For K-2 Students, Carl F. Siebert, Jonathan L. Brendefur

Curriculum, Instruction, and Foundational Studies Faculty Publications and Presentations

The Primary Math Assessment (PMA) tool is increasingly being used in multiple districts in a northwestern state. The PMA provides both screening and diagnostic information in six domains to assess mathematical proficiency in young students in their early educational years. A previous study using multidimensional Rasch analyses found support for the PMA’s six-dimensional theoretical framework, and that the PMA is a reliable mathematics assessment for early grades. This study extended the examination of a Rasch model, implementing exploratory and confirmatory factor analysis, Item Response Theory, and Differential Item Functioning analyses. In doing so, this study found an IRT 2-PL model …


Characteristics And Equation Of Accounting Vocational Theory Trial Test Items For Vocational High Schools By Subject-Matter Teachers' Forum, Dian Normalitasari Purnama Dec 2017

Characteristics And Equation Of Accounting Vocational Theory Trial Test Items For Vocational High Schools By Subject-Matter Teachers' Forum, Dian Normalitasari Purnama

REID (Research and Evaluation in Education)

This study is aimed at: (1) understanding the characteristics of Accounting Vocational Theory trial test items using the Item Response Theory and (2) determining the horizontal equation of Accounting Vocational Theory trial exam instruments. This was explorative-descriptive research, observing the subject of the eleventh-grade students. The research objects were test instruments and responses of students from six schools selected through the stratified random sampling technique. The data analysis employed review sheets and BILOG program for the Item Response Theory 2PL. The findings were as follows. (1) The test item review of test packages A and B found 37 good quality …


Local Item Response Theory For Detection Of Spatially Varying Differential Item Functioning, Samantha Robinson Dec 2017

Local Item Response Theory For Detection Of Spatially Varying Differential Item Functioning, Samantha Robinson

Graduate Theses and Dissertations

Mappings of spatially-varying Item Response Theory (IRT) parameters are proposed, allowing for visual investigation of potential Differential Item Functioning (DIF) based upon geographical location without need for pre-specified groupings and before any confirmatory DIF testing. This proposed model is a localized approach to IRT modeling and DIF detection that provides a flexible framework, with current emphasis being on 1PL/Rasch and 2PL models. Applications to both simulated and empirical survey data, utilizing a box-car kernel weighting scheme with several fixed bandwidths on irregular spatial lattices, are presented both to demonstrate the methodology and to illustrate the benefit of localized IRT modeling. …


Development And Validation Of The Statistics Assessment Of Graduate Students, Dammika Lakmal Walpitage Dec 2016

Development And Validation Of The Statistics Assessment Of Graduate Students, Dammika Lakmal Walpitage

Doctoral Dissertations

This study developed the Statistics Assessment of Graduate Students (SAGS) instrument, and established its preliminary item characteristics, reliability, and validity evidence. Even though there are limited number of assessments available for measuring different aspects of statistical cognition, these previously available assessments have numerous limitations. The SAGS instrument was developed using Rasch modeling approach to create a new measure of statistical research methodology knowledge of graduate students in education and other behavioral and social sciences. Thirty-five multiple-choice questions were written with stems representing applied research situations and response options distinguishing between appropriate use of various statistical tests or procedures. A focus …


An Examination Of The Psychometric Properties Of The Student Risk Screening Scale For Internalizing And Externalizing Behaviors: An Item Response Theory Approach, Sara E. Moulton Dec 2016

An Examination Of The Psychometric Properties Of The Student Risk Screening Scale For Internalizing And Externalizing Behaviors: An Item Response Theory Approach, Sara E. Moulton

Theses and Dissertations

This research study examined the psychometric properties of the Student Risk Screening Scale for Internalizing and Externalizing Behaviors (SRSS-IE) using Item Response Theory (IRT) methods among a sample of 2,122 middle school students. The SRSS-IE is a recently revised screening instrument aimed at identifying students who are potentially at risk for emotional and behavioral disorders (EBD). There are two studies included in this research. Study 1 utilized the Nominal Response and Generalized Partial Credit models of IRT to evaluate items from the SRSS-IE in terms of the degree to which the response options for each item functioned as intended by …


The Effects Of Scaling On Trends Of Development: Classical Test Theory And Item Response Theory, Weldon Z. Smith Apr 2016

The Effects Of Scaling On Trends Of Development: Classical Test Theory And Item Response Theory, Weldon Z. Smith

College of Education and Human Sciences: Dissertations, Theses, and Student Research

The scale metrics used in educational testing are often arbitrary, and this can impact interpretation of scores on measurements. Both classical test theory sum scores and item response theory estimates measure the same underlying dimension, but differences in the two scales may lead one to be more preferential than the other in interpreting data. Mismatch between individual ability and test difficulty can further result in difficulties in correctly interpreting trends of development in longitudinal data. A previous limited simulation by Embretson (2007) demonstrated that classical test theory sum scores result in misinterpretation of linear trends of development, and that item …


Longitudinal Analysis For Ordinal Data Through Multilevel And Item Response Modeling: Applications To Child Observation Record (Cor), Zijia Li Jan 2016

Longitudinal Analysis For Ordinal Data Through Multilevel And Item Response Modeling: Applications To Child Observation Record (Cor), Zijia Li

Theses and Dissertations--Educational, School, and Counseling Psychology

The large proportion of children from low SES backgrounds and the increasing achievement gap between disadvantaged children and their more advantaged peers are beckoning national attention to investment in early childhood education (Reid, Kagan, Hilton, & Potter, 2015). As evidenced in many research studies (e.g., Heckman & Masterov, 2007), early childhood experiences are critical to the development of children. High-quality early learning is necessary for children, especially disadvantaged children, to narrow the school readiness gap and to build a foundation for more advanced academic, social, and cognitive skills. Importantly, high quality assessment tools (i.e., reliability and validity evidence supporting the …


Evaluating The Impact Of Construct Shift On Item Parameter Invariance, Test Equating And Proficiency Estimates, Xueming Li Nov 2015

Evaluating The Impact Of Construct Shift On Item Parameter Invariance, Test Equating And Proficiency Estimates, Xueming Li

Doctoral Dissertations

Common Core State Standards in English Language Arts and Mathematics at grades K to 12 were introduced in 2009 and at one time had been accepted by 45 of the states in the U.S. The new standards have created national curricula in these two subject areas. Along with this reform, new assessment systems have been developed too. Many of these new tests are showing signs of being more multidimensional than the tests they were replacing because of the use of new item formats, and the assessment of higher level thinking skills and various performance skills. In the short term at …


Measuring Student Growth In K–12 Schools Using Item Response Theory Within Structural Equation Models, Kenneth Lee Thompson Aug 2015

Measuring Student Growth In K–12 Schools Using Item Response Theory Within Structural Equation Models, Kenneth Lee Thompson

Dissertations

The use of test-based accountability has expanded beyond measurements of school effectiveness to include measurements of teacher effectiveness. However, whereas the use of test-based accountability has expanded, the understanding of the statistical methodologies used in accountability systems has not kept pace. Currently, Student Growth Percentiles and value-added modeling are the most prevalent methodologies for estimating annual student growth. Each of these methodologies is regression-based and relies on scale scores from standardized assessments. Given the prevalence of Item Response Theory in statewide assessment programs, these scale scores often result from Item Response Theory scaling practices. Grounded in earlier work of Brockman …


Assessing The Validity And Reliability Of Dichotomous Test Results Using Item Response Theory On A Group Of First Year Engineering Students, Edmund Nevin, Avril Behan, Gavin Duffy, Stephanie Farrell, Rachel Harding, Robert Howard, Aaron Mac Raighne, Brian Bowe Jul 2015

Assessing The Validity And Reliability Of Dichotomous Test Results Using Item Response Theory On A Group Of First Year Engineering Students, Edmund Nevin, Avril Behan, Gavin Duffy, Stephanie Farrell, Rachel Harding, Robert Howard, Aaron Mac Raighne, Brian Bowe

Conference papers

Traditional measurement instruments employed to assess the performance of student’s studying on STEM (Science, Technology, Engineering and Mathematics) related programmes typically involve classification based on final scores. The validity and reliability of these instruments and test forms are important considerations when assessing whether a student understands content and if not, where and in what way they are struggling. The aim of this study is to examine, validate and analyse the test results of first-year engineering student’s at an Institute of Higher Learning in Ireland who took the Purdue Spatial Visualisation Test of Rotation (PSVT:R). Results obtained were analysed using the …


Development Of The Biostatistics And Clinical Epidemiology Skills Assessment For Medical Residents, Patrick Brian Barlow May 2014

Development Of The Biostatistics And Clinical Epidemiology Skills Assessment For Medical Residents, Patrick Brian Barlow

Doctoral Dissertations

This study developed the Biostatics and Clinical Epidemiology Skills (BACES) assessment, and established its preliminary item characteristics and validity evidence. Unlike previous instruments, the BACES assessment was developed and tested using an item response theory (IRT) approach to measurement to create a new, adaptive biostatistics and clinical epidemiology knowledge assessment for graduate medical professionals. Thirty multiple-choice questions were written to focus on interpreting relevant examples of clinical epidemiology and statistical methods. A four person expert panel reviewed these items for content validity. After this review, the BACES assessment was administered to 147 medical residents across three academic medical centers. Results …


Effect Of Automatic Item Generation On Ability Estimates In A Multistage Test, Kimberly F. Colvin Apr 2014

Effect Of Automatic Item Generation On Ability Estimates In A Multistage Test, Kimberly F. Colvin

Doctoral Dissertations

In adaptive testing, including multistage adaptive testing (MST), the psychometric properties of the test items are needed to route the examinees through the test. However, if testing programs use items which are automatically generated at the time of administration there is no opportunity to calibrate the items therefore the items’ psychometric properties need to be predicted. This simulation study evaluates the accuracy with which examinees’ abilities can be estimated when automatically generated items, specifically, item clones, are used in MSTs. The behavior of the clones in this study was modeled according to the results of Sinharay and Johnson’s (2008) investigation …


Examination Of The Application Of Item Response Theory To The Angoff Standard Setting Procedure, Jerome Cody Clauser Sep 2013

Examination Of The Application Of Item Response Theory To The Angoff Standard Setting Procedure, Jerome Cody Clauser

Open Access Dissertations

Establishing valid and reliable passing scores is a vital activity for any examination used to make classification decisions. Although there are many different approaches to setting passing scores, this thesis is focused specifically on the Angoff standard setting method. The Angoff method is a test-centric classical test theory based approach to estimating performance standards. In the Angoff method each judge estimates the proportion of minimally competent examinees who will answer each item correctly. These values are summed across items and averages across judges to arrive at a recommended passing score. Unfortunately, research has shown that the Angoff method has a …


The Development And Validation Of A Spanish Elicited Imitation Test Of Oral Language Proficiency For The Missionary Training Center, Carrie A. Thompson Jun 2013

The Development And Validation Of A Spanish Elicited Imitation Test Of Oral Language Proficiency For The Missionary Training Center, Carrie A. Thompson

Theses and Dissertations

The Missionary Training Center (MTC), affiliated with the Church of Jesus Christ of Latter-day Saints, needs a reliable and cost effective way to measure the oral language proficiency of missionaries learning Spanish. The MTC needed to measure incoming missionaries' Spanish language proficiency for training and classroom assignment as well as to provide exit measures of institutional progress. Oral proficiency interviews and semi-direct assessments require highly trained raters, which is costly and time-consuming. The Elicited Imitation (EI) test is a computerized, automated test that measures oral language proficiency by having the participant hear and repeat utterances of varying syllable length in …


Investigating How Equating Guidelines For Screening And Selecting Common Items Apply When Creating Vertically Scaled Elementary Mathematics Tests, Maria Assunta Hardy Dec 2011

Investigating How Equating Guidelines For Screening And Selecting Common Items Apply When Creating Vertically Scaled Elementary Mathematics Tests, Maria Assunta Hardy

Theses and Dissertations

Guidelines to screen and select common items for vertical scaling have been adopted from equating. Differences between vertical scaling and equating suggest that these guidelines may not apply to vertical scaling in the same way that they apply to equating. For example, in equating the examinee groups are assumed to be randomly equivalent, but in vertical scaling the examinee groups are assumed to possess different levels of proficiency. Equating studies that examined the characteristics of the common-item set stress the importance of careful item selection, particularly when groups differ in ability level. Since in vertical scaling cross-level ability differences are …


A Monte Carlo Study Investigating Missing Data, Differential Item Functioning, And Effect Size, Phyllis Lorena Garrett Aug 2009

A Monte Carlo Study Investigating Missing Data, Differential Item Functioning, And Effect Size, Phyllis Lorena Garrett

Educational Policy Studies Dissertations

ABSTRACT A MONTE CARLO STUDY INVESTIGATING MISSING DATA, DIFFERENTIAL ITEM FUNCTIONING, AND EFFECT SIZE by Phyllis Garrett The use of polytomous items in assessments has increased over the years, and as a result, the validity of these assessments has been a concern. Differential item functioning (DIF) and missing data are two factors that may adversely affect assessment validity. Both factors have been studied separately, but DIF and missing data are likely to occur simultaneously in real assessment situations. This study investigated the Type I error and power of several DIF detection methods and methods of handling missing data for polytomous …


The Impact Of Multidimensionality On The Detection Of Differential Bundle Functioning Using Sibtest., Terris Raiford-Ross Feb 2008

The Impact Of Multidimensionality On The Detection Of Differential Bundle Functioning Using Sibtest., Terris Raiford-Ross

Educational Policy Studies Dissertations

In response to public concern over fairness in testing, conducting a differential item functioning (DIF) analysis is now standard practice for many large-scale testing programs (e.g., Scholastic Aptitude Test, intelligence tests, licensing exams). As highlighted by the Standards for Educational and Psychological Testing manual, the legal and ethical need to avoid bias when measuring examinee abilities is essential to fair testing practices (AERA-APA-NCME, 1999). Likewise, the development of statistical and substantive methods of investigating DIF is crucial to the goal of designing fair and valid educational and psychological tests. Douglas, Roussos and Stout (1996) introduced the concept of item bundle …


Students Ranking, Based On Their Abilities On Objective Type Test: Comparison Of Ctt And Irt, Amir Zaman, Atiq-Ur-Rehman Kashmiri, Mamoon Mubarak, Arshad Ali Jan 2008

Students Ranking, Based On Their Abilities On Objective Type Test: Comparison Of Ctt And Irt, Amir Zaman, Atiq-Ur-Rehman Kashmiri, Mamoon Mubarak, Arshad Ali

EDU-COM International Conference

The focus of this paper is to investigate the idea of differential ability for students ranking on a multiple choice test in the subject of physics at secondary level. The weakness of Classical Test Theory (CTT) in measuring the ability with continuity and consequently its ranking ability of students on the basis of ability has been highlighted in this paper. In CTT, a student attempting a difficult question and an easy question get equal credit which is not the case in Item Response Theory (IRT). Moreover in CTT two students with equal raw score have the same ranking while in …


The Strength Of Multidimensional Item Response Theory In Exploring Construct Space That Is Multidimensional And Correlated, Steven Gerry Spencer Dec 2004

The Strength Of Multidimensional Item Response Theory In Exploring Construct Space That Is Multidimensional And Correlated, Steven Gerry Spencer

Theses and Dissertations

This dissertation compares the parameter estimates obtained from two item response theory (IRT) models: the 1-PL IRT model and the MC1-PL IRT model. Several scenarios were explored in which both unidimensional and multidimensional item-level and personal-level data were used to generate the item responses. The Monte Carlo simulations mirrored the real-life application of the two correlated dimensions of Necessary Operations and Calculations in the basic mathematics domain. In all scenarios, the MC1-PL IRT model showed greater precision in the recovery of the true underlying item difficulty values and person theta values along each primary dimension as well as along a …