Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistical Models

2017

Institution
Keyword
Publication
Publication Type

Articles 31 - 60 of 81

Full-Text Articles in Physical Sciences and Mathematics

Visualizing Lab And Phenotype Associations Using Phewas And Electronic Health Records, Brenda Emerson, Miriam Goldman, Sahiti Kolli Jul 2017

Visualizing Lab And Phenotype Associations Using Phewas And Electronic Health Records, Brenda Emerson, Miriam Goldman, Sahiti Kolli

Honors Projects

As the digitization of patient health records is becoming more common, we are given a great opportunity to analyze these records and hopefully make discoveries about diseases or medicines. Being given large datasets of Electronic Health Records, I and two other students decided to look for novel phenotype associations with mean lab values, look to see whether the presence of a lab had associations with a phenotype, and create an interactive application to visual the associations between labs and phenotypes.


Burden Of Atopic Dermatitis In The United States: Analysis Of Healthcare Claims Data In The Commercial, Medicare, And Medi-Cal Databases, Sulena Shrestha, Raymond Miao, Li Wang, Jingdong Chao, Huseyin Yuce, Wenhui Wei Jul 2017

Burden Of Atopic Dermatitis In The United States: Analysis Of Healthcare Claims Data In The Commercial, Medicare, And Medi-Cal Databases, Sulena Shrestha, Raymond Miao, Li Wang, Jingdong Chao, Huseyin Yuce, Wenhui Wei

Publications and Research

Comparative data on the burden of atopic dermatitis (AD) in adults relative to the general population are limited. We performed a large-scale evaluation of the burden of disease among US adults with AD relative to matched non-AD controls, encompassing comorbidities, healthcare resource utilization (HCRU), and costs, using healthcare claims data. The impact of AD disease severity on these outcomes was also evaluated.


Information Metrics For Predictive Modeling And Machine Learning, Kostantinos Gourgoulias Jul 2017

Information Metrics For Predictive Modeling And Machine Learning, Kostantinos Gourgoulias

Doctoral Dissertations

The ever-increasing complexity of the models used in predictive modeling and data science and their use for prediction and inference has made the development of tools for uncertainty quantification and model selection especially important. In this work, we seek to understand the various trade-offs associated with the simulation of stochastic systems. Some trade-offs are computational, e.g., execution time of an algorithm versus accuracy of simulation. Others are analytical: whether or not we are able to find tractable substitutes for quantities of interest, e.g., distributions, ergodic averages, etc. The first two chapters of this thesis deal with the study of the …


Statistical Methods For High Dimensional Data Arising From Large Epidemiological Studies, Hui Xu Jul 2017

Statistical Methods For High Dimensional Data Arising From Large Epidemiological Studies, Hui Xu

Doctoral Dissertations

In this thesis, we propose statistical models for addressing commonly encountered data types and study designs in large epidemiologic investigations aimed at understanding the molecular basis of complex disorders. The motivating applications come from diverse disease areas in Women's Health, including the study of type II diabetes in the Women's Health Initiative (WHI), invasive breast cancer in the Nurses' Health Study and the study of the metabolomic underpinnings of cardiovascular disease in the WHI. We have also put significant effort into making the implementation of the proposed methods accessible through freely available, user-friendly software packages in R. The first chapter …


Factor Based Statistical Arbitrage In The U.S. Equity Market With A Model Breakdown Detection Process, Seoungbyung Park Jul 2017

Factor Based Statistical Arbitrage In The U.S. Equity Market With A Model Breakdown Detection Process, Seoungbyung Park

Master's Theses (2009 -)

Many researchers have studied different strategies of statistical arbitrage to provide a steady stream of returns that are unrelated to the market condition. Among different strategies, factor-based mean reverting strategies have been popular and covered by many. This thesis aims to add value by evaluating the generalized pairs trading strategy and suggest enhancements to improve out-of-sample performance. The enhanced strategy generated the daily Sharpe ratio of 6.07% in the out-of-sample period from January 2013 through October 2016 with the correlation of -.03 versus S&P 500. During the same period, S&P 500 generated the Sharpe ratio of 6.03%. This thesis is …


Mixture Models For Undiagnosed Prevalent Disease And Interval-Censored Incident Disease: Applications To A Cohort Assembled From Electronic Health Records., Li C Cheung, Qing Pan, Noorie Hyun, Mark Schiffman, Barbara Fetterman, Philip E Castle, Thomas Lorey, Hormuzd A Katki Jun 2017

Mixture Models For Undiagnosed Prevalent Disease And Interval-Censored Incident Disease: Applications To A Cohort Assembled From Electronic Health Records., Li C Cheung, Qing Pan, Noorie Hyun, Mark Schiffman, Barbara Fetterman, Philip E Castle, Thomas Lorey, Hormuzd A Katki

Epidemiology Faculty Publications

For cost-effectiveness and efficiency, many large-scale general-purpose cohort studies are being assembled within large health-care providers who use electronic health records. Two key features of such data are that incident disease is interval-censored between irregular visits and there can be pre-existing (prevalent) disease. Because prevalent disease is not always immediately diagnosed, some disease diagnosed at later visits are actually undiagnosed prevalent disease. We consider prevalent disease as a point mass at time zero for clinical applications where there is no interest in time of prevalent disease onset. We demonstrate that the naive Kaplan-Meier cumulative risk estimator underestimates risks at early …


Gridiron-Gurus Final Report: Fantasy Football Performance Prediction, Kyle Tanemura, Michael Li, Erica Dorn, Ryan Mckinney Jun 2017

Gridiron-Gurus Final Report: Fantasy Football Performance Prediction, Kyle Tanemura, Michael Li, Erica Dorn, Ryan Mckinney

Computer Science and Software Engineering

Gridiron Gurus is a desktop application that allows for the creation of custom AI profiles to help advise and compete against in a Fantasy Football setting. Our AI are capable of performing statistical prediction of players on both a season long and week to week basis giving them the ability to both draft and manage a fantasy football team throughout a season.


Mechanistic Mathematical Models: An Underused Platform For Hpv Research, Marc Ryser, Patti Gravitt, Evan R. Myers Jun 2017

Mechanistic Mathematical Models: An Underused Platform For Hpv Research, Marc Ryser, Patti Gravitt, Evan R. Myers

Global Health Faculty Publications

Health economic modeling has become an invaluable methodology for the design and evaluation of clinical and public health interventions against the human papillomavirus (HPV) and associated diseases. At the same time, relatively little attention has been paid to a different yet complementary class of models, namely that of mechanistic mathematical models. The primary focus of mechanistic mathematical models is to better understand the intricate biologic mechanisms and dynamics of disease. Inspired by a long and successful history of mechanistic modeling in other biomedical fields, we highlight several areas of HPV research where mechanistic models have the potential to advance the …


Firing Rate Heterogeneity And Consequences For Coding In Feedforward Circuits, Cheng Ly, Gary Marsat May 2017

Firing Rate Heterogeneity And Consequences For Coding In Feedforward Circuits, Cheng Ly, Gary Marsat

Biology and Medicine Through Mathematics Conference

No abstract provided.


Methods For Parameter Estimation Of A Stochastic Seir Model, Kaitlyn Martinez May 2017

Methods For Parameter Estimation Of A Stochastic Seir Model, Kaitlyn Martinez

Biology and Medicine Through Mathematics Conference

No abstract provided.


Shape Features Underlying The Perception Of Liquids, Jan Jaap R. Van Assen, Pascal Barla, Roland W. Fleming May 2017

Shape Features Underlying The Perception Of Liquids, Jan Jaap R. Van Assen, Pascal Barla, Roland W. Fleming

MODVIS Workshop

No abstract provided.


Mortgage Transition Model Based On Loanperformance Data, Shuyao Yang May 2017

Mortgage Transition Model Based On Loanperformance Data, Shuyao Yang

Arts & Sciences Electronic Theses and Dissertations

The unexpected increase in loan default on the mortgage market is widely considered to be one of the main cause behind the economic crisis. To provide some insight on loan delinquency and default, I analyze the mortgage performance data from Fannie Mae website and investigate how economic factors and individual loan and borrower information affect the events of default and prepaid. Various delinquency status including default and prepaid are treated as discrete states of a Markov chain. One-step transition probabilities are estimated via multinomial logistic models. We find that in general current loan-to-value ratio, credit score, unemployment rate, and interest …


A Multifactorial Obesity Model Developed From Nationwide Public Health Exposome Data And Modern Computational Analyses, Lisaann S. Gittner, Barbara J. Kilbourne, Ravi Vadapalli, Hafiz M.K. Khan, Michael A. Langston May 2017

A Multifactorial Obesity Model Developed From Nationwide Public Health Exposome Data And Modern Computational Analyses, Lisaann S. Gittner, Barbara J. Kilbourne, Ravi Vadapalli, Hafiz M.K. Khan, Michael A. Langston

Sociology Faculty Research

Summary

Statement of the problem

Obesity is both multifactorial and multimodal, making it difficult to identify, unravel and distinguish causative and contributing factors. The lack of a clear model of aetiology hampers the design and evaluation of interventions to prevent and reduce obesity.

Methods

Using modern graph-theoretical algorithms, we are able to coalesce and analyse thousands of inter-dependent variables and interpret their putative relationships to obesity. Our modelling is different from traditional approaches; we make no a priori assumptions about the population, and model instead based on the actual characteristics of a population. Paracliques, noise-resistant collections of highly-correlated variables, are …


Multidataset Independent Subspace Analysis: A Framework For Analysis Of Multimodal, Multi-Subject Brain Imaging Data, Rogers F. Silva May 2017

Multidataset Independent Subspace Analysis: A Framework For Analysis Of Multimodal, Multi-Subject Brain Imaging Data, Rogers F. Silva

Electrical and Computer Engineering ETDs

Mental illnesses are serious disorders of the brain that have devastating effects on individuals and society. In addition to their disabling and impairing effects, mental illnesses have deep social and economical implications, accounting for an estimated loss of 12 billion working days and a care cost surge to $6 trillion a year by 2030. For diseases such as depression and anxiety, enhancing preventive programs and treatment accessibility, in combination with accurate early diagnosis and personalized treatments, are projected to result in a four-fold return on every dollar invested, a strategy that can drastically help curtail those losses. Notably, within the …


Gilmore Girls And Instagram: A Statistical Look At The Popularity Of The Television Show Through The Lens Of An Instagram Page, Brittany Simmons May 2017

Gilmore Girls And Instagram: A Statistical Look At The Popularity Of The Television Show Through The Lens Of An Instagram Page, Brittany Simmons

Student Scholar Symposium Abstracts and Posters

After going on the Warner Brothers Tour in December of 2015, I created a Gilmore Girls Instagram account. This account, which started off as a way for me to create edits of the show and post my photos from the tour turned into something bigger than I ever could have imagined. In just over a year I have over 55,000 followers. I post content including revival news, merchandise, and edits of the show that have been featured in Entertainment Weekly, Bustle, E! News, People Magazine, Yahoo News, & GilmoreNews.

I created a dataset of qualitative and quantitative outcomes from my …


Comparison Of Survival Curves Between Cox Proportional Hazards, Random Forests, And Conditional Inference Forests In Survival Analysis, Brandon Weathers May 2017

Comparison Of Survival Curves Between Cox Proportional Hazards, Random Forests, And Conditional Inference Forests In Survival Analysis, Brandon Weathers

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Survival analysis methods are a mainstay of the biomedical fields but are finding increasing use in other disciplines including finance and engineering. A widely used tool in survival analysis is the Cox proportional hazards regression model. For this model, all the predicted survivor curves have the same basic shape, which may not be a good approximation to reality. In contrast the Random Survival Forests does not make the proportional hazards assumption and has the flexibility to model survivor curves that are of quite different shapes for different groups of subjects. We applied both techniques to a number of publicly available …


A Comparison Of Statistical Methods Relating Pairwise Distance To A Binary Subject-Level Covariate, Rachael Stone May 2017

A Comparison Of Statistical Methods Relating Pairwise Distance To A Binary Subject-Level Covariate, Rachael Stone

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

A community ecologist provided a motivating data set involving a certain animal species with two behavior groups, along with a pairwise genetic distance matrix among individuals. Many community ecologists have analyzed similar data sets with a method known as the Hopkins method, testing for an association between the subject-level covariate (behavior group) and the pairwise distance. This community ecologist wanted to know if they used the Hopkins method, would their results be meaningful? Their question inspired this thesis work, where a different data set was used for confidentiality reasons. Multiple methods (Hopkins method, ADONIS, ANOSIM, and Distance Regression) were used …


Inference On The Stress-Strength Model From Weibull Gamma Distribution, Mahmoud Mansour, Rashad El-Sagheer, M. A. W. Mahmoud Prof. May 2017

Inference On The Stress-Strength Model From Weibull Gamma Distribution, Mahmoud Mansour, Rashad El-Sagheer, M. A. W. Mahmoud Prof.

Basic Science Engineering

No abstract provided.


Performance Of Imputation Algorithms On Artificially Produced Missing At Random Data, Tobias O. Oketch May 2017

Performance Of Imputation Algorithms On Artificially Produced Missing At Random Data, Tobias O. Oketch

Electronic Theses and Dissertations

Missing data is one of the challenges we are facing today in modeling valid statistical models. It reduces the representativeness of the data samples. Hence, population estimates, and model parameters estimated from such data are likely to be biased.

However, the missing data problem is an area under study, and alternative better statistical procedures have been presented to mitigate its shortcomings. In this paper, we review causes of missing data, and various methods of handling missing data. Our main focus is evaluating various multiple imputation (MI) methods from the multiple imputation of chained equation (MICE) package in the statistical software …


A Bayesian Variable Selection Method With Applications To Spatial Data, Xiahan Tang May 2017

A Bayesian Variable Selection Method With Applications To Spatial Data, Xiahan Tang

Graduate Theses and Dissertations

This thesis first describes the general idea behind Bayes Inference, various sampling methods based on Bayes theorem and many examples. Then a Bayes approach to model selection, called Stochastic Search Variable Selection (SSVS) is discussed. It was originally proposed by George and McCulloch (1993). In a normal regression model where the number of covariates is large, only a small subset tend to be significant most of the times. This Bayes procedure specifies a mixture prior for each of the unknown regression coefficient, the mixture prior was originally proposed by Geweke (1996). This mixture prior will be updated as data becomes …


Statistical Methods For Two Problems In Cancer Research: Analysis Of Rna-Seq Data From Archival Samples And Characterization Of Onset Of Multiple Primary Cancers, Jialu Li May 2017

Statistical Methods For Two Problems In Cancer Research: Analysis Of Rna-Seq Data From Archival Samples And Characterization Of Onset Of Multiple Primary Cancers, Jialu Li

Dissertations & Theses (Open Access)

My dissertation is focused on quantitative methodology development and application for two important topics in translational and clinical cancer research.

The first topic was motivated by the challenge of applying transcriptome sequencing (RNA-seq) to formalin-fixation and paraffin-embedding (FFPE) tumor samples for reliable diagnostic development. We designed a biospecimen study to directly compare gene expression results from different protocols to prepare libraries for RNA-seq from human breast cancer tissues, with randomization to fresh-frozen (FF) or FFPE conditions. To comprehensively evaluate the FFPE RNA-seq data quality for expression profiling, we developed multiple computational methods for assessment, such as the uniformity and continuity …


Modelling Cash Crop Growth In Tn, Spencer Weston May 2017

Modelling Cash Crop Growth In Tn, Spencer Weston

Chancellor’s Honors Program Projects

No abstract provided.


On Post-Selection Confidence Intervals In Linear Regression, Xinwei Zhang May 2017

On Post-Selection Confidence Intervals In Linear Regression, Xinwei Zhang

Arts & Sciences Electronic Theses and Dissertations

The general goal of this thesis is to investigate and examine some issues about post-selection inference which arises from the setting where statistical inference is carried out after a datadriven model selection step. In this setting, the classical inference theory which requires a fixed priori model becomes invalid since the selected model is a result of random event. Hence, a common practice in applied research which ignores the model selection and builds up confidence interval will result in misleading or even false conclusion. In this thesis, specifically, we first discusses some examples to show how the classical inference theory loses …


Statistical Analysis Of Markovian Queueing Models Of Limit Order Books, Yiyao Luo May 2017

Statistical Analysis Of Markovian Queueing Models Of Limit Order Books, Yiyao Luo

Arts & Sciences Electronic Theses and Dissertations

The objective of this thesis is to investigate the suitability of some Markovian queueing models in being able to effectively describe the dynamical properties of a limit order book more specifically. We review and compare the assumptions proposed by Huang et al.[Quantitative Finance,12,547-557(2012)] and Cont et al.[SIAM Journal for Financial Mathematics,4,1- 25(2013)], and estimate the intensity parameters in both ways, based on real data of a stock on the Nasdaq Stock Market. Trough comparing by cumulative distribution functions of first-passage time to state 0, we will hsow that the estimators of Cont’s model fit our data better and we put …


Network Exploration Of Correlated Multivariate Protein Data For Alzheimer's Disease Association, Matthew J. Lane Apr 2017

Network Exploration Of Correlated Multivariate Protein Data For Alzheimer's Disease Association, Matthew J. Lane

Theses

Alzheimer Disease (AD) is difficult to diagnose by using genetic testing or other traditional methods. Unlike diseases with simple genetic risk components, there exists no single marker determining as to whether someone will develop AD. Furthermore, AD is highly heterogeneous and different subgroups of individuals develop the disease due to differing factors. Traditional diagnostic methods using perceivable cognitive deficiencies are often too little too late due to the brain having suffered damage from decades of disease progression. In order to observe AD at early stages prior to the observation of cognitive deficiencies, biomarkers with greater accuracy are required. By using …


A General Approach For Predicting The Behavior Of The Supreme Court Of The United States, Daniel Katz Apr 2017

A General Approach For Predicting The Behavior Of The Supreme Court Of The United States, Daniel Katz

All Faculty Scholarship

Building on developments in machine learning and prior work in the science of judicial prediction, we construct a model designed to predict the behavior of the Supreme Court of the United States in a generalized, out-of-sample context. To do so, we develop a time-evolving random forest classifier that leverages unique feature engineering to predict more than 240,000 justice votes and 28,000 cases outcomes over nearly two centuries (1816-2015). Using only data available prior to decision, our model outperforms null (baseline) models at both the justice and case level under both parametric and non-parametric tests. Over nearly two centuries, we achieve …


Implementing Propensity Score Matching With Network Data: The Effect Of Gatt On Bilateral Trade, Luca De Benedictis, Bruno Arpino, Alessandra Mattei Mar 2017

Implementing Propensity Score Matching With Network Data: The Effect Of Gatt On Bilateral Trade, Luca De Benedictis, Bruno Arpino, Alessandra Mattei

Luca De Benedictis

Motivated by the evaluation of the causal effect of the General Agreement on Tariffs and Trade on bilateral international trade flows, we investigate the role of network structure in propensity score matching under the assumption of strong ignorability. We study the sensitivity of causal inference with respect to the presence of characteristics of the network in the set of confounders conditional on which strong ignorability is assumed to hold. We find that estimates of the average causal effect are highly sensitive to the presence of node-level network statistics in the set of confounders. Therefore, we argue that estimates may suffer …


Statistically Analyzing Assembly Line Processing Times Through Incorporation Of Product Variation, Kyle Rehr, Matthew Farr Mar 2017

Statistically Analyzing Assembly Line Processing Times Through Incorporation Of Product Variation, Kyle Rehr, Matthew Farr

Scholars Week

Timing methods and performance metrics are important in the heavily industrialized world we live in. Industrial plants use metrics to measure quality of production, help make decisions, and drive the strategy of the organization. However, there are many factors to be considered when measuring performance based on a metric; of which we will be analyzing the importance of product variation. We will be analyzing assembly line timings, whilst controlling for product variance, to show the importance differences between products makes in one’s ability to predict performance. In addition, we will be analyzing the current “statistical” methods used by an industrial …


Maximum Likelihood Estimation Of Parameters In Exponential Power Distribution With Upper Record Values, Tianchen Zhi Mar 2017

Maximum Likelihood Estimation Of Parameters In Exponential Power Distribution With Upper Record Values, Tianchen Zhi

FIU Electronic Theses and Dissertations

The exponential power (EP) distribution is a very important distribution that was used by survival analysis and related with asymmetrical EP distribution. Many researchers have discussed statistical inference about the parameters in EP distribution using i.i.d random samples. However, sometimes available data might contain only record values, or it is more convenient for researchers to collect record values. We aim to resolve this problem. We estimated two parameters of the EP distribution by MLE using upper record values. According to simulation study, we used the Bias and MSE of the estimators for studying the efficiency of the proposed estimation method. …


Shining A Light On A University Special Collection With Data Visualization, Lisa Deluca, Katie M. Wissel Mar 2017

Shining A Light On A University Special Collection With Data Visualization, Lisa Deluca, Katie M. Wissel

Kathryn Wissel, MBA, MI

The Valente Collection is a 29,000 volume special collection that bridges Italian and Italian American history, literature, religion and art. It is a unique asset for the library and the university. One concept for promoting this collection and offering insight into the holdings is visualization. This goal of this poster is to help academic librarians assess which tools are most appropriate to create visualizations of current collections. Examples of different visualization types are explained including Excel Power Map. Tableau and Datawrapper.