Maximum Entropy Classification For Record Linkage, 2020 University of Alabama
Maximum Entropy Classification For Record Linkage, Danhyang Lee, Li-Chun Zhang, Jae Kwang Kim
By record linkage one joins records residing in separate files which are believed to be related to the same entity. In this paper we approach record linkage as a classification problem, and adapt the maximum entropy classification method in text mining to record linkage, both in the supervised and unsupervised settings of machine learning. The set of links will be chosen according to the associated uncertainty. On the one hand, our framework overcomes some persistent theoretical flaws of the classical approach pioneered by Fellegi and Sunter (1969); on the other hand, the proposed algorithm is scalable and fully automatic, unlike ...
Semiparametric Imputation Using Conditional Gaussian Mixture Models Under Item Nonresponse, 2020 University of Alabama
Semiparametric Imputation Using Conditional Gaussian Mixture Models Under Item Nonresponse, Danhyang Lee, Jae Kwang Kim
Imputation is a popular technique for handling item nonresponse in survey sampling. Parametric imputation is based on a parametric model for imputation and is less robust against the failure of the imputation model. Nonparametric imputation is fully robust but is not applicable when the dimension of covariates is large due to the curse of dimensionality. Semiparametric imputation is another robust imputation based on a flexible model where the number of model parameters can increase with the sample size. In this paper, we propose another semiparametric imputation based on a more flexible model assumption than the Gaussian mixture model. In the ...
Rdc Data Alternatives: Conducting Research During Covid-19, 2020 Western University
Rdc Data Alternatives: Conducting Research During Covid-19, Kristi Thompson, Elizabeth Hill
Western Libraries Presentations
Recent physical distancing protocols pertaining to the COVID-19 Pandemic have meant that RDC researchers need to find alternatives ways of carrying out their research. The Real Time Remote Access (RTRA) program offers one alternative way to access confidential Statistics Canada data. Other options include using the Statistics Canada public use files and analyzing data from other sources.
The presenters, data librarians from Western Libraries will discuss the differences between the data that can be accessed through the RTRA the RDC. RTRA data is a very useful option for some types of questions but also has some important limitations. We will ...
Do We Need To Reconsider The Cmam Admission And Discharge Criteria?; An Analysis Of Cmam Data In South Sudan, 2020 Seoul National University
Do We Need To Reconsider The Cmam Admission And Discharge Criteria?; An Analysis Of Cmam Data In South Sudan, Eunyong Ahn, Cyprian Ouma, Mesfin Loha, Asrat Dibaba, Wendy Dyment, Jae Kwang Kim, Nam Seon Beck, Taesung Park
Background: Weight-for-height Z-score (WHZ) and Mid Upper Arm Circumference (MUAC) are both commonly used as acute malnutrition screening criteria. However, there exists disparity between the groups identified as malnourished by them. Thus, here we aim to investigate the clinical features and linkage with chronicity of the acute malnutrition cases identified by either WHZ or MUAC. Besides, there exists evidence indicating that fat restoration is disproportionately rapid compared to that of muscle gain in hospitalized malnourished children but related research at community level is lacking. In this study we suggest proxy measure to inspect body composition restoration responding to malnutrition management ...
Doubly Robust Inference When Combining Probability And Non-Probability Samples With High-Dimensional Data, 2020 North Carolina State University
Doubly Robust Inference When Combining Probability And Non-Probability Samples With High-Dimensional Data, Shu Yang, Jae Kwang Kim, Rui Song
Non-probability samples become increasingly popular in survey statistics but may suffer from selection biases that limit the generalizability of results to the target population. We consider integrating a non-probability sample with a probability sample which provides high-dimensional representative covariate information of the target population. We propose a two-step approach for variable selection and finite population inference. In the first step, we use penalized estimating equations with folded-concave penalties to select important variables for the sampling score of selection into the non-probability sample and the outcome model. We show that the penalized estimating equation approach enjoys the selection consistency property for ...
Characterizing Uncertainty In Correlated Response Variables For Pareto Front Optimization, 2020 Air Force Institute of Technology
Characterizing Uncertainty In Correlated Response Variables For Pareto Front Optimization, Peter A. Calhoun
Theses and Dissertations
Current research provides a method to incorporate uncertainty into Pareto front optimization by simulating additional response surface model parameters according to a Multivariate Normal Distribution (MVN). This research shows that analogous to the univariate case, the MVN understates uncertainty, leading to overconfident conclusions when variance is not known and there are few observations (less than 25-30 per response). This research builds upon current methods using simulated response surface model parameters that are distributed according to an Multivariate t-Distribution (MVT), which can be shown to produce a more accurate inference when variance is not known. The MVT better addresses uncertainty in ...
Combining Non-Probability And Probability Survey Samples Through Mass Imputation, 2020 Iowa State University
Combining Non-Probability And Probability Survey Samples Through Mass Imputation, Jae Kwang Kim, Seho Park, Yulin Chen, Changbao Wu
This paper presents theoretical results on combining non-probability and probability survey samples through mass imputation, an approach originally proposed by Rivers (2007) as sample matching without rigorous theoretical justification. Under suitable regularity conditions, we establish the consistency of the mass imputation estimator and derive its asymptotic variance formula. Variance estimators are developed using either linearization or bootstrap. Finite sample performances of the mass imputation estimator are investigated through simulation studies and an application to analyzing a non-probability sample collected by the Pew Research Centre.
The Prevalent Misuse Of Fisher’S Partial Eta Squared Formula, 2020 University of Iowa
The Prevalent Misuse Of Fisher’S Partial Eta Squared Formula, Mariah Cooper
Honors Theses at the University of Iowa
The recording of an estimate of effect size is an essential tool for empirical science because it allows for statistical power. In addition, it enables researchers to replicate studies because it assists in choosing subject amounts effectively. A popular measure of effect size is partial eta squared and is often calculated using Fisher's formula. Despite the positive impact that partial eta provides to empirical researchers, it comes with two problems. One is that researchers are misusing this formula because it was initially made for between-subject designs. When measuring the effect size via partial eta squared in a between-subject design ...
Technological Software In Mathematics, 2020 The University of Akron
Technological Software In Mathematics, Courtney Kish
Williams Honors College, Honors Research Projects
Technology has been advancing significantly over the years. One area that has been affected is mathematics. Technological software has been developed that has allowed for mathematics to be done using software programs. For example, WebAssign allows students to complete online math homework and example practice problems, as well as watch videos to explain topics in math. However, like all things, this is not without downfalls. While this technology offers students access to online lectures and instant feedback, cheating and costly expenses also come with it.
In this research paper, I will discuss the benefits and shortcomings of different technological software ...
Bread Dough Experiment, 2020 Misericordia University
Bread Dough Experiment, Collin Stivala
Student Research Poster Presentations 2020
This is my Final Poster for Design of Experiments. My poster explains the process and results of my experiment, in which I made bread dough, and tested the effects that Flour and Temperature have on bread dough.
Reporting And Analysis Of Split Plot Designs In Preclinical Animal Experiments, 2020 Iowa State University
Reporting And Analysis Of Split Plot Designs In Preclinical Animal Experiments, Pu Liu
The split plot design (SPD) has at least two types of experimental units and at least two levels of complete random design. As a result of this SPD structure, a method of analysis that accounts for the different levels of experimental unit is required, which is commonly a mixed model or a split-plot ANOVA. The design is utilized when it is not feasible to randomize the multiple interventions to the same level. The classic example of a split plot arises from agronomy, and gives name to the design, where the effects of two irrigation methods (factor 1) that must be ...
Power Analysis On A Pilot Study Of The Caloric Intake Of Children Helping Prepare Meals Versus Children Not, 2020 Misericordia University
Power Analysis On A Pilot Study Of The Caloric Intake Of Children Helping Prepare Meals Versus Children Not, Danielle Clifford
Student Research Poster Presentations 2020
The purpose of this analysis is to determine the sample size needed for a study that will be used to discover if there is a difference in the caloric intake of children who help with meal preparation and children who do not help with meal preparation.
Does Water Boil Faster With Salt?, 2020 Misericordia University
Does Water Boil Faster With Salt?, Soumyadip Acharyya
Student Research Poster Presentations 2020
Whether water boils faster with salt is perhaps a never-ending question. My study has addressed this topic from a statistical perspective. Additionally, I have also investigated whether the water quantity affects the boiling time. I used the two-way Analysis of variance (ANOVA) to analyze and interpret the data.
The Effect Of The Amount Of Water And Water Exposure Time On The Absorbency Of Sponges, 2020 Misericordia University
The Effect Of The Amount Of Water And Water Exposure Time On The Absorbency Of Sponges, Danielle Clifford
Student Research Poster Presentations 2020
Given the current global pandemic, now more than ever it is important to understand what factors lead to the best absorbency in a sponge as to stop the spread of bacteria and germs. The purpose of the experiment will be to determine the effect of the amount of time (15 seconds, 30 seconds, 45 seconds, 60 seconds, 75 seconds, and 90 seconds) and the amount of water (24 ounces, 32 ounces, and 40 ounces) on the absorbency of a sponge.
Statistical Data Integration In Survey Sampling: A Review, 2020 North Carolina State University
Statistical Data Integration In Survey Sampling: A Review, Shu Yang, Jae Kwang Kim
Finite population inference is a central goal in survey sampling. Probability sampling is the main statistical approach to finite population inference. Challenges arise due to high cost and increasing non-response rates. Data integration provides a timely solution by leveraging multiple data sources to provide more robust and efficient inference than using any single data source alone. The technique for data integration varies depending on types of samples and available information to be combined. This article provides a systematic review of data integration techniques for combining probability samples, probability and non-probability samples, and probability and big data samples. We discuss a ...
Variance Estimation After Kernel Ridge Regression Imputation, 2020 Iowa State University
Variance Estimation After Kernel Ridge Regression Imputation, Hengfang Wang, Jae Kwang Kim
Statistics Conference Proceedings, Presentations and Posters
Imputation is a popular technique for handling missing data. Variance estimation after imputation is an important practical problem in statistics. In this paper, we consider variance estimation of the imputed mean estimator under the kernel ridge regression imputation. We consider a linearization approach which employs the covariate balancing idea to estimate the inverse of propensity scores. The statistical guarantee of our proposed variance estimation is studied when a Sobolev space is utilized to do the imputation, where n-consistency can be obtained. Synthetic data experiments are presented to conﬁrm our theory.
A Two-Stage Design For Comparing Binomial Treatments With A Standard, 2020 University of North Florida
A Two-Stage Design For Comparing Binomial Treatments With A Standard, Cecelia K. Schmidt
UNF Graduate Theses and Dissertations
We propose a method for comparing success rates of several populations among each other and against a desired standard success rate. This design is appropriate for a situation in which all experimental treatments have only two outcomes that can be considered “success”and “failure” respectively. The goal is to identify which treatment has the highest rate of success that is also higher than the desired standard. The design combines elements of both hypothesis testing and statistical selection. At the first stage, if none of the samples have a number of successes above the appropriate standard for the design, the experiment ...
Public Perception Of Different Planting Techniques Using Augmented Reality, 2020 Georgia Southern University
Public Perception Of Different Planting Techniques Using Augmented Reality, Sultana Quader Tania
Electronic Theses and Dissertations
The objective of this study was to measure public perception of the different planting techniques (block and matrix), which are used at visitor information centers (VICs) and other rights of way (ROW) areas. The main factors that affect public perception of planting techniques were identified through an extensive literature review and qualitative survey from four welcome centers in the state of Georgia. The ranking of those indicators, based on public preferences, was discovered through a quantitative survey. During the first phase of the quantitative survey, images of block and matrix were used. An iOS-based user-friendly and cost-effective augmented reality (AR ...
A Note On Propensity Score Weighting Method Using Paradata In Survey Sampling, 2019 Dartmouth College
A Note On Propensity Score Weighting Method Using Paradata In Survey Sampling, Seho Park, Jae Kwang Kim, Kimin Kim
Paradata is often collected during the survey process to monitor the quality of the survey response. One such paradata is a respondent behavior, which can be used to construct response models. The propensity score weight using the respondent behavior information can be applied to the final analysis to reduce the nonresponse bias. However, including the surrogate variable in the propensity score weighting does not always guarantee the efficiency gain. We show that the surrogate variable is useful only when it is correlated with the study variable. Results from a limited simulation study confirm the finding. A real data application using ...
Trends And Disparities In Self-Reported And Measured Osteoporosis Among Us Adults, 2007-2014., 2019 University of Nevada, Las Vegas
Trends And Disparities In Self-Reported And Measured Osteoporosis Among Us Adults, 2007-2014., Qing Wu, Yingke Xu, Ge Lin
Environmental & Occupational Health Faculty Publications
(1) Background: Studies examining osteoporosis trends among US adults by different socioeconomic status (SES) are limited. The prevalence of self-reported osteoporosis in the US is rarely reported. (2) Methods: Data from the National Health and Nutritional Examination Survey (NHANES) between 2007–2008 and 2013–2014 cycles were analyzed. Age-adjusted prevalence of self-reported and that of measured osteoporosis were calculated overall and by sex, race/ethnicity, education attainment, and SES. (3) Results: The prevalence of self-reported osteoporosis was higher than that of measured osteoporosis in all three survey cycles for women, and in 2007–2008 and 2009–2010 for men. Participants ...