Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistical Models

COBRA

2017

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Optimized Variable Selection Via Repeated Data Splitting, Marinela Capanu, Colin B. Begg, Mithat Gonen Jan 2017

Optimized Variable Selection Via Repeated Data Splitting, Marinela Capanu, Colin B. Begg, Mithat Gonen

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

We introduce a new variable selection procedure that repeatedly splits the data into two sets, one for estimation and one for validation, to obtain an empirically optimized threshold which is then used to screen for variables to include in the final model. Simulation results show that the proposed variable selection technique enjoys superior performance compared to candidate methods, being amongst those with the lowest inclusion of noisy predictors while having the highest power to detect the correct model and being unaffected by correlations among the predictors. We illustrate the methods by applying them to a cohort of patients undergoing hepatectomy …