Functional Regression, 2015 The University of Texas

#### Functional Regression, Jeffrey S. Morris

*Jeffrey S. Morris*

Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the development of this field, which has accelerated in the past 10 years to become one of the fastest growing areas of statistics, fueled by the growing number of applications yielding this type of data. One unique characteristic of FDA is the need to combine information both across and within functions, which Ramsay ...

Spatiotemporal Crime Analysis, 2014 Purdue University

#### Spatiotemporal Crime Analysis, James Q. Tay, Abish Malik, Sherry Towers, David Ebert

*The Summer Undergraduate Research Fellowship (SURF) Symposium*

There has been a rise in the use of visual analytic techniques to create interactive predictive environments in a range of different applications. These tools help the user sift through massive amounts of data, presenting most useful results in a visual context and enabling the person to rapidly form proactive strategies. In this paper, we present one such visual analytic environment that uses historical crime data to predict future occurrences of crimes, both geographically and temporally. Due to the complexity of this analysis, it is necessary to find an appropriate statistical method for correlative analysis of spatiotemporal data, as well ...

A Bayesian Approach To Joint Modeling Of Menstrual Cycle Length And Fecundity, 2014 COBRA

#### A Bayesian Approach To Joint Modeling Of Menstrual Cycle Length And Fecundity, Kristen J. Lum, Rakesjwaro Sundaram, Germaine M. Buck-Louis, Thomas A. Louis

*Johns Hopkins University, Dept. of Biostatistics Working Papers*

Female menstrual cycle length is thought to play an important role in couple fecundity, or the biologic capacity for reproduction irrespective of pregnancy intentions. A complete assessment of the association between menstrual cycle length and fecundity requires a model that accounts for multiple risk factors (both male and female) and the couple's intercourse pattern relative to ovulation. We employ a Bayesian joint model consisting of a mixed effects accelerated failure time model for longitudinal menstrual cycle lengths and a hierarchical model for the conditional probability of pregnancy in a menstrual cycle given no pregnancy in previous cycles of trying ...

Applying Multiple Imputation For External Calibration To Propensty Score Analysis, 2014 COBRA

#### Applying Multiple Imputation For External Calibration To Propensty Score Analysis, Yenny Webb-Vargas, Kara E. Rudolph, D. Lenis, Peter Murakami, Elizabeth A. Stuart

*Johns Hopkins University, Dept. of Biostatistics Working Papers*

Although covariate measurement error is likely the norm rather than the exception, methods for handling covariate measurement error in propensity score methods have not been widely investigated. We consider a multiple imputation-based approach that uses an external calibration sample with information on the true and mismeasured covariates, Multiple Imputation for External Calibration (MI-EC), to correct for the measurement error. We investigate the performance of MI-EC using simulation studies. As expected, a naive method that simply uses the covariate measured with error leads to bias in the treatment effect estimate. Another approach that uses only the joint distribution of the true ...

Using Graphs To Characterize Nationwide Physician Referral Networks, 2014 Yale University

#### Using Graphs To Characterize Nationwide Physician Referral Networks, Ding Tong, Shu-Xia Li, Isuru Ranasinghe, Sudhakar Nuti, Hongyu Zhao, Harlan Krumholz

*Yale Day of Data*

AIM:

Evaluating physician referral network characteristics can help to understand how physicians and hospitals interact to provide patient services within the US healthcare system and ultimately how this may influence patient outcomes.

METHOD:

We used the 2012-2013 national Physician Referral data from the Centers for Medicare & Medicaid Services (CMS), which consists of 73,071,804 pairs of referrals from one health provider to another in calendar year 2012 and the first two quarters of year 2013 within 30 days of care. These referrals are from 642,144 national-wide physicians and 4,811 hospitals. We obtained information for each provider, physician ...

Stratified Meta-Analysis To Examine Data Biases In Lung Cancer Studies Of Refinery Workers, 2014 Yale University

#### Stratified Meta-Analysis To Examine Data Biases In Lung Cancer Studies Of Refinery Workers, Sherman Selix

*Yale Day of Data*

Petroleum refineries employ a variety of workers who historically experienced different potentials for asbestos exposure depending on job tasks. Associations between petroleum refinery work and lung cancer related to occupational asbestos exposure have been quantified among various locations, corporations, and time periods. To combine the data from several individual refinery studies and examine an overall effect, a systematic review and stratified meta-analysis was employed. Using set search terms among four databases, 112 potential publications were identified, of which 29 qualified for meta-analysis. Risk estimates and confidence intervals were extracted from these publications to construct four separate datasets. Inverse variance weighting ...

Online Targeted Learning, 2014 COBRA

#### Online Targeted Learning, Mark J. Van Der Laan, Samuel D. Lendle

*U.C. Berkeley Division of Biostatistics Working Paper Series*

We consider the case that the data comes in sequentially and can be viewed as sample of independent and identically distributed observations from a fixed data generating distribution. The goal is to estimate a particular path wise target parameter of this data generating distribution that is known to be an element of a particular semi-parametric statistical model. We want our estimator to be asymptotically efficient, but we also want that our estimator can be calculated by updating the current estimator based on the new block of data without having to revisit the past data, so that it is computationally much ...

#### Estimation Of The Overall Treatment Effect In The Presence Of Interference In Cluster-Randomized Trials Of Infectious Disease Prevention, Nicole Bohme Carnegie, Rui Wang, Victor De Gruttola

*Harvard University Biostatistics Working Paper Series*

No abstract provided.

Modeling Count Data; Errata And Comments, 2014 SelectedWorks

#### Modeling Count Data; Errata And Comments, Joseph M. Hilbe

*Joseph M Hilbe*

Modeling Count Data: Errata and Comments PDF. Will be updated on a continuing basis.

Computational Methods For Historical Research On Wikipedia’S Archives, 2014 Chapman University

#### Computational Methods For Historical Research On Wikipedia’S Archives, Jonathan Cohen

*e-Research: A Journal of Undergraduate Work*

This paper presents a novel study of geographic information implicit in the English Wikipedia archive. This project demonstrates a method to extract data from the archive with data mining, map the global distribution of Wikipedia editors through geocoding in GIS, and proceed with a spatial analysis of Wikipedia use in metropolitan cities.

Cox Regression Models With Functional Covariates For Survival Data, 2014 COBRA

#### Cox Regression Models With Functional Covariates For Survival Data, Jonathan E. Gellar, Elizabeth Colantuoni, Dale M. Needham, Ciprian M. Crainiceanu

*Johns Hopkins University, Dept. of Biostatistics Working Papers*

We extend the Cox proportional hazards model to cases when the exposure is a densely sampled functional process, measured at baseline. The fundamental idea is to combine penalized signal regression with methods developed for mixed effects proportional hazards models. The model is fit by maximizing the penalized partial likelihood, with smoothing parameters estimated by a likelihood-based criterion such as AIC or EPIC. The model may be extended to allow for multiple functional predictors, time varying coefficients, and missing or unequally-spaced data. Methods were inspired by and applied to a study of the association between time to death after hospital discharge ...

Targeted Learning Of An Optimal Dynamic Treatment, And Statistical Inference For Its Mean Outcome, 2014 COBRA

#### Targeted Learning Of An Optimal Dynamic Treatment, And Statistical Inference For Its Mean Outcome, Mark J. Van Der Laan, Alexander R. Luedtke

*U.C. Berkeley Division of Biostatistics Working Paper Series*

Suppose we observe n independent and identically distributed observations of a time-dependent random variable consisting of baseline covariates, initial treatment and censoring indicator, intermediate covariates, subsequent treatment and censoring indicator, and a final outcome. For example, this could be data generated by a sequentially randomized controlled trial, where subjects are sequentially randomized to a first line and second line treatment, possibly assigned in response to an intermediate biomarker, and are subject to right-censoring. In this article we consider estimation of an optimal dynamic multiple time-point treatment rule defined as the rule that maximizes the mean outcome under the dynamic treatment ...

Comparing Partial Least Square Approaches In Gene-Or Region-Based Association Study For Multiple Quantitative Phenotypes, 2014 Wayne State University

#### Comparing Partial Least Square Approaches In Gene-Or Region-Based Association Study For Multiple Quantitative Phenotypes, Zhongshang Yuan, Xiaoshuai Zhang, Fangyu Li, Jinghua Zhao, Fuzhong Xue

*Human Biology Open Access Pre-Prints*

On thinking quantitatively of complex diseases, there are at least three statistical strategies for association study: single SNP on single trait, gene-or region (with multiple SNPs) on single trait and on multiple traits. The third of which is the most general in dissecting the genetic mechanism underlying complex diseases underpinning multiple quantitative traits. Gene-or region association methods based on partial least square (PLS) approaches have been shown to have apparent power advantage. However, few attempts are developed for multiple quantitative phenotypes or traits underlying a condition or disease, and the performance of various PLS approaches used in association study for ...

Inferences In Log-Rate Models, 2014 Minnesota State University, Mankato

#### Inferences In Log-Rate Models, Herbert C. Heien, William A. Baumann

*Journal of Undergraduate Research at Minnesota State University, Mankato*

Log-Rate models are used in analyzing rates of individuals who are exposed to a risk of having a certain characteristic. The explanatory variables could be categorical or in a continuous scale. In finding a Log-Rate Model, parameters are estimated and goodness-of-fit are studied to carefully extract the best model to fit our data. Here we revisit three aspects of Log-Rate Models using the data set give at the end of the paper. The three aspects are parameter estimation, goodness-of-fit of the model, and marginal effect of the factors.

Bird Keeping And Lung Cancer, 2014 Minnesota State University, Mankato

#### Bird Keeping And Lung Cancer, Andrew Tackmann, Jonathan Hellman, Jamie Johnson

*Journal of Undergraduate Research at Minnesota State University, Mankato*

Logistic regression is reviewed in estimating parameters and in making inferences about the parameters. A contingency table approach in computing goodness of fit in logistic regression is elaborated. An existing data on a sample of lung cancer patients and a control group is used to apply the procedures discussed. The data reveals that between the groups considered, the factors ‘bird keeping’ and ‘the number of years of smoking’ are significant as the causes for lung cancer.

Simulating Burr Type Vii Distributions Through The Method Of L-Moments And L-Correlations, 2014 SelectedWorks

#### Simulating Burr Type Vii Distributions Through The Method Of L-Moments And L-Correlations, Mohan D. Pant, Todd C. Headrick

*Mohan Dev Pant*

Burr Type VII, a one-parameter non-normal distribution, is among the less studied distributions, especially, in the contexts of statistical modeling and simulation studies. The main purpose of this study is to introduce a methodology for simulating univariate and multivariate Burr Type VII distributions through the method of L-moments and L-correlations. The methodology can be applied in statistical modeling of events in a variety of applied mathematical contexts and Monte Carlo simulation studies. Numerical examples are provided to demonstrate that L-moment-based Burr Type VII distributions are superior to their conventional moment-based analogs in terms of distribution fitting and estimation. Simulation results ...

The Doubly Adaptive Lasso Methods For Time Series Analysis, 2014 Western University

#### The Doubly Adaptive Lasso Methods For Time Series Analysis, Zi Zhen Liu

*University of Western Ontario - Electronic Thesis and Dissertation Repository*

In this thesis, we propose a systematic approach called the doubly adaptive LASSO tailored to time series analysis, which includes four specific methods for four time series models, respectively:

The PAC-weighted adaptive LASSO for univariate autoregressive (AR) models. Although the LASSO methodology has been applied to AR models, the existing methods in the literature ignore the temporal dependence information embedded in AR time series data. Consequently, the methods may not reflect the characteristics of underlying AR processes, especially, the lag order of AR models. The PAC-weighted adaptive LASSO incorporates the partial autocorrelation (PAC) into the adaptive LASSO weights. The PAC-weighted ...

Perfect And Nearly Perfect Sampling Of Work-Conserving Queues, 2014 Western University

#### Perfect And Nearly Perfect Sampling Of Work-Conserving Queues, Yaofei Xiong

*University of Western Ontario - Electronic Thesis and Dissertation Repository*

We present sampling-based methods to treat work-conserving queueing systems. A variety of models are studied. Besides the First Come First Served (FCFS) queues, many efforts are putted on the accumulating priority queue (APQ), where a customer accumulates priority linearly while waiting. APQs have Poisson arrivals, multi-class customers with corresponding service durations, and single or multiple servers.

Perfect sampling is an approach to draw a sample directly from the steady-state distribution of a Markov chain without explicitly solving for it. Statistical inference can be conducted without initialization bias. If an error can be tolerated within some limit, i.e. the total ...

Mathematical Modeling And Simulation Of Multialleic Migration-Selection Models, 2014 Minnesota State University, Mankato

#### Mathematical Modeling And Simulation Of Multialleic Migration-Selection Models, Chad N. Vidden

*Journal of Undergraduate Research at Minnesota State University, Mankato*

Population ecology is concerned with the growth and decay of specific populations. This field has a variety of applications ranging from evolution and survival at the environmental level to the spread of infectious disease at the cellular and molecular levels. Many ecological circumstances require the use of mathematical methods and reasoning in order to acquire better knowledge of the issue at hand. This study considered and analyzed multiple different mathematical models of population dynamics along with their purposes. This foundation was then applied in order to explore the migration of populations from one isolated region to another along with the ...

Identification Of Informativeness In Text Using Natural Language Stylometry, 2014 Western University

#### Identification Of Informativeness In Text Using Natural Language Stylometry, Rushdi Shams

*University of Western Ontario - Electronic Thesis and Dissertation Repository*

In this age of information overload, one experiences a rapidly growing over-abundance of written text. To assist with handling this bounty, this plethora of texts is now widely used to develop and optimize statistical natural language processing (NLP) systems. Surprisingly, the use of more fragments of text to train these statistical NLP systems may not necessarily lead to improved performance. We hypothesize that those fragments that help the most with training are those that contain the desired information. Therefore, determining informativeness in text has become a central issue in our view of NLP. Recent developments in this field have spawned ...