Open Access. Powered by Scholars. Published by Universities.®
- Institution
- Keyword
-
- Counting process (6)
- Model selection (6)
- Causal inference (5)
- Censored data (5)
- Cross-validation (5)
-
- Estimating equation (5)
- Prediction (5)
- Genetics (4)
- Risk (4)
- Semiparametric model (4)
- Censoring (3)
- Counterfactual (3)
- Longitudinal data (3)
- Loss function (3)
- Mathematics (3)
- Monte Carlo Simulation (3)
- Asymptotic Theory (2)
- Confounding (2)
- Correlation (2)
- Cross validation (2)
- Density estimation (2)
- Disease screening (2)
- Double robust estimation (2)
- Double robustness (2)
- Estimation (2)
- Family-wise error rate control (2)
- Frailty (2)
- G-computation estimation (2)
- Gene expression (2)
- Generalized estimating equations (2)
- Publication Year
- Publication
-
- U.C. Berkeley Division of Biostatistics Working Paper Series (39)
- Harvard University Biostatistics Working Paper Series (27)
- The University of Michigan Department of Biostatistics Working Paper Series (13)
- Johns Hopkins University, Dept. of Biostatistics Working Papers (10)
- COBRA Preprint Series (7)
-
- UW Biostatistics Working Paper Series (7)
- Basic Science Engineering (4)
- FIU Electronic Theses and Dissertations (4)
- Data Science and Data Mining (2)
- CHIP Documents (1)
- Community & Environmental Health Faculty Publications (1)
- Department of Management: Faculty Publications (1)
- Department of Statistics: Dissertations, Theses, and Student Work (1)
- Georgetown Law Faculty Publications and Other Works (1)
- Honors Projects in Mathematics (1)
- Honors Scholar Theses (1)
- Masters Theses & Specialist Projects (1)
- Student Publications & Research (1)
- Student Scholar Symposium Abstracts and Posters (1)
Articles 1 - 30 of 123
Full-Text Articles in Statistical Models
Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe
Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe
Data Science and Data Mining
This project estimates a regression model to predict the superconducting critical temperature based on variables extracted from the superconductor’s chemical formula. The regression model along with the stepwise variable selection gives a reasonable and good predictive model with a lower prediction error (MSE). Variables extracted based on atomic radius, valence, atomic mass and thermal conductivity appeared to have the most contribution to the predictive model.
Machine Learning Approaches For Cyberbullying Detection, Roland Fiagbe
Machine Learning Approaches For Cyberbullying Detection, Roland Fiagbe
Data Science and Data Mining
Cyberbullying refers to the act of bullying using electronic means and the internet. In recent years, this act has been identifed to be a major problem among young people and even adults. It can negatively impact one’s emotions and lead to adverse outcomes like depression, anxiety, harassment, and suicide, among others. This has led to the need to employ machine learning techniques to automatically detect cyberbullying and prevent them on various social media platforms. In this study, we want to analyze the combination of some Natural Language Processing (NLP) algorithms (such as Bag-of-Words and TFIDF) with some popular machine learning …
Uconn Baseball Batting Order Optimization, Gavin Rublewski, Gavin Rublewski
Uconn Baseball Batting Order Optimization, Gavin Rublewski, Gavin Rublewski
Honors Scholar Theses
Challenging conventional wisdom is at the very core of baseball analytics. Using data and statistical analysis, the sets of rules by which coaches make decisions can be justified, or possibly refuted. One of those sets of rules relates to the construction of a batting order. Through data collection, data adjustment, the construction of a baseball simulator, and the use of a Monte Carlo Simulation, I have assessed thousands of possible batting orders to determine the roster-specific strategies that lead to optimal run production for the 2023 UConn baseball team. This paper details a repeatable process in which basic player statistics …
A Simple Algorithm For Generating A New Two Sample Type-Ii Progressive Censoring With Applications, E. M. Shokr, Rashad Mohamed El-Sagheer, Mahmoud Mansour, H. M. Faied, B. S. El-Desouky
A Simple Algorithm For Generating A New Two Sample Type-Ii Progressive Censoring With Applications, E. M. Shokr, Rashad Mohamed El-Sagheer, Mahmoud Mansour, H. M. Faied, B. S. El-Desouky
Basic Science Engineering
In this article, we introduce a simple algorithm to generating a new type-II progressive censoring scheme for two samples. It is observed that the proposed algorithm can be applied for any continues probability distribution. Moreover, the description model and necessary assumptions are discussed. In addition, the steps of simple generation algorithm along with programming steps are also constructed on real example. The inference of two Weibull Frechet populations are discussed under the proposed algorithm. Both classical and Bayesian inferential approaches of the distribution parameters are discussed. Furthermore, approximate confidence intervals are constructed based on the asymptotic distribution of the maximum …
A Monte Carlo Analysis Of Standard Error-Based Methods For Computing Confidence Intervals, Elayna Wichert
A Monte Carlo Analysis Of Standard Error-Based Methods For Computing Confidence Intervals, Elayna Wichert
Masters Theses & Specialist Projects
The objective of this study is to empirically test existing techniques to calculate the likely range of values for a Classical Test Theory true score given an observed score. The traditional method for forming these confidence intervals has used the standard error of measurement (SEM) as the basis for this confidence interval. An alternate equation, the standard error of estimate (SEE), has been recommended in place of the SEM for this purpose, yet it remains overlooked in the field of psychometrics. It is important that the correct equation be used in various applications in personnel psychology. Monte Carlo analyses were …
Inferences For Weibull-Gamma Distribution In Presence Of Partially Accelerated Life Test, Mahmoud Mansour, M A W Mahmoud Prof., Rashad El-Sagheer
Inferences For Weibull-Gamma Distribution In Presence Of Partially Accelerated Life Test, Mahmoud Mansour, M A W Mahmoud Prof., Rashad El-Sagheer
Basic Science Engineering
In this paper, the point at issue is to deliberate point and interval estimations for the parameters of Weibull-Gamma distribution (WGD) using progressively Type-II censored (PROG-II-C) sample under step stress partially accelerated life test (SSPALT) model. The maximum likelihood (ML), Bayes, and four parametric bootstrap methods are used to obtain the point estimations for the distribution parameters and the acceleration factor. Furthermore, the approximate confidence intervals (ACIs), four bootstrap confidence intervals and credible intervals of the estimators have been gotten. The results of Bayes estimators are computed under the squared error loss (SEL) function using Markov Chain Monte Carlo (MCMC) …
Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan
Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan
COBRA Preprint Series
One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease's process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards …
Non Parametric Test For Testing Exponentiality Against Exponential Better Than Used In Laplace Transform Order, Mahmoud Mansour, M A W Mahmoud Prof.
Non Parametric Test For Testing Exponentiality Against Exponential Better Than Used In Laplace Transform Order, Mahmoud Mansour, M A W Mahmoud Prof.
Basic Science Engineering
In this paper, the test statistic for testing exponentiality against exponential better than used in Laplace transform order (EBUL) based on the Laplace transform technique is proposed. Pitman’s asymptotic efficiency of our test is calculated and compared with other tests. The percentiles of this test are tabulated. The powers of the test are estimated for famously used distributions in aging problems. In the case of censored data, our test is applied and the percentiles are also calculated and tabulated. Finally, real examples in different areas are utilized as practical applications for the proposed test.
Controlling For Confounding Via Propensity Score Methods Can Result In Biased Estimation Of The Conditional Auc: A Simulation Study, Hadiza I. Galadima, Donna K. Mcclish
Controlling For Confounding Via Propensity Score Methods Can Result In Biased Estimation Of The Conditional Auc: A Simulation Study, Hadiza I. Galadima, Donna K. Mcclish
Community & Environmental Health Faculty Publications
In the medical literature, there has been an increased interest in evaluating association between exposure and outcomes using nonrandomized observational studies. However, because assignments to exposure are not random in observational studies, comparisons of outcomes between exposed and nonexposed subjects must account for the effect of confounders. Propensity score methods have been widely used to control for confounding, when estimating exposure effect. Previous studies have shown that conditioning on the propensity score results in biased estimation of conditional odds ratio and hazard ratio. However, research is lacking on the performance of propensity score methods for covariate adjustment when estimating the …
On The Performance Of Some Poisson Ridge Regression Estimators, Cynthia Zaldivar
On The Performance Of Some Poisson Ridge Regression Estimators, Cynthia Zaldivar
FIU Electronic Theses and Dissertations
Multiple regression models play an important role in analyzing and making predictions about data. Prediction accuracy becomes lower when two or more explanatory variables in the model are highly correlated. One solution is to use ridge regression. The purpose of this thesis is to study the performance of available ridge regression estimators for Poisson regression models in the presence of moderately to highly correlated variables. As performance criteria, we use mean square error (MSE), mean absolute percentage error (MAPE), and percentage of times the maximum likelihood (ML) estimator produces a higher MSE than the ridge regression estimator. A Monte Carlo …
Gilmore Girls And Instagram: A Statistical Look At The Popularity Of The Television Show Through The Lens Of An Instagram Page, Brittany Simmons
Gilmore Girls And Instagram: A Statistical Look At The Popularity Of The Television Show Through The Lens Of An Instagram Page, Brittany Simmons
Student Scholar Symposium Abstracts and Posters
After going on the Warner Brothers Tour in December of 2015, I created a Gilmore Girls Instagram account. This account, which started off as a way for me to create edits of the show and post my photos from the tour turned into something bigger than I ever could have imagined. In just over a year I have over 55,000 followers. I post content including revival news, merchandise, and edits of the show that have been featured in Entertainment Weekly, Bustle, E! News, People Magazine, Yahoo News, & GilmoreNews.
I created a dataset of qualitative and quantitative outcomes from my …
Inference On The Stress-Strength Model From Weibull Gamma Distribution, Mahmoud Mansour, Rashad El-Sagheer, M. A. W. Mahmoud Prof.
Inference On The Stress-Strength Model From Weibull Gamma Distribution, Mahmoud Mansour, Rashad El-Sagheer, M. A. W. Mahmoud Prof.
Basic Science Engineering
No abstract provided.
Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret
Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret
UW Biostatistics Working Paper Series
We have frequently implemented crossover studies to evaluate new therapeutic interventions for genital herpes simplex virus infection. The outcome measured to assess the efficacy of interventions on herpes disease severity is the viral shedding rate, defined as the frequency of detection of HSV on the genital skin and mucosa. We performed a simulation study to ascertain whether our standard model, which we have used previously, was appropriately considering all the necessary features of the shedding data to provide correct inference. We simulated shedding data under our standard, validated assumptions and assessed the ability of 5 different models to reproduce the …
Bootstrapping Vs. Asymptotic Theory In Property And Casualty Loss Reserving, Andrew J. Difronzo Jr.
Bootstrapping Vs. Asymptotic Theory In Property And Casualty Loss Reserving, Andrew J. Difronzo Jr.
Honors Projects in Mathematics
One of the key functions of a property and casualty (P&C) insurance company is loss reserving, which calculates how much money the company should retain in order to pay out future claims. Most P&C insurance companies use non-stochastic (non-random) methods to estimate these future liabilities. However, future loss data can also be projected using generalized linear models (GLMs) and stochastic simulation. Two simulation methods that will be the focus of this project are: bootstrapping methodology, which resamples the original loss data (creating pseudo-data in the process) and fits the GLM parameters based on the new data to estimate the sampling …
Best Practice Recommendations For Data Screening, Justin A. Desimone, Peter D. Harms, Alice J. Desimone
Best Practice Recommendations For Data Screening, Justin A. Desimone, Peter D. Harms, Alice J. Desimone
Department of Management: Faculty Publications
Survey respondents differ in their levels of attention and effort when responding to items. There are a number of methods researchers may use to identify respondents who fail to exert sufficient effort in order to increase the rigor of analysis and enhance the trustworthiness of study results. Screening techniques are organized into three general categories, which differ in impact on survey design and potential respondent awareness. Assumptions and considerations regarding appropriate use of screening techniques are discussed along with descriptions of each technique. The utility of each screening technique is a function of survey design and administration. Each technique has …
An Alternative Goodness-Of-Fit Test For Normality With Unknown Parameters, Weiling Shi
An Alternative Goodness-Of-Fit Test For Normality With Unknown Parameters, Weiling Shi
FIU Electronic Theses and Dissertations
Goodness-of-fit tests have been studied by many researchers. Among them, an alternative statistical test for uniformity was proposed by Chen and Ye (2009). The test was used by Xiong (2010) to test normality for the case that both location parameter and scale parameter of the normal distribution are known. The purpose of the present thesis is to extend the result to the case that the parameters are unknown. A table for the critical values of the test statistic is obtained using Monte Carlo simulation. The performance of the proposed test is compared with the Shapiro-Wilk test and the Kolmogorov-Smirnov test. …
Asymmetric Empirical Similarity, Joshua C. Teitelbaum
Asymmetric Empirical Similarity, Joshua C. Teitelbaum
Georgetown Law Faculty Publications and Other Works
The paper offers a formal model of analogical legal reasoning and takes the model to data. Under the model, the outcome of a new case is a weighted average of the outcomes of prior cases. The weights capture precedential influence and depend on fact similarity (distance in fact space) and precedential authority (position in the judicial hierarchy). The empirical analysis suggests that the model is a plausible model for the time series of U.S. maritime salvage cases. Moreover, the results evince that prior cases decided by inferior courts have less influence than prior cases decided by superior courts.
Meta-Analysis Of Social-Personality Psychological Research, Blair T. Johnson, Alice H. Eagly
Meta-Analysis Of Social-Personality Psychological Research, Blair T. Johnson, Alice H. Eagly
CHIP Documents
This publication provides a contemporary treatment of the subject of meta-analysis in relation to social-personality psychology. Meta-analysis literally refers to the statistical pooling of the results of independent studies on a given subject, although in practice it refers as well to other steps of research synthesis, including defining the question under investigation, gathering all available research reports, coding of information about the studies and their effects, and interpretation/dissemination of results. Discussed as well are the hallmarks of high-quality meta-analyses.
Inferences About Parameters Of Trivariate Normal Distribution With Missing Data, Xing Wang
Inferences About Parameters Of Trivariate Normal Distribution With Missing Data, Xing Wang
FIU Electronic Theses and Dissertations
Multivariate normal distribution is commonly encountered in any field, a frequent issue is the missing values in practice. The purpose of this research was to estimate the parameters in three-dimensional covariance permutation-symmetric normal distribution with complete data and all possible patterns of incomplete data. In this study, MLE with missing data were derived, and the properties of the MLE as well as the sampling distributions were obtained. A Monte Carlo simulation study was used to evaluate the performance of the considered estimators for both cases when ρ was known and unknown. All results indicated that, compared to estimators in the …
Statistical Inference For Data Adaptive Target Parameters, Mark J. Van Der Laan, Alan E. Hubbard, Sara Kherad Pajouh
Statistical Inference For Data Adaptive Target Parameters, Mark J. Van Der Laan, Alan E. Hubbard, Sara Kherad Pajouh
U.C. Berkeley Division of Biostatistics Working Paper Series
Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in estimation-sample (one of the V subsamples) and corresponding complementary parameter-generating sample that is used to generate a target parameter. For each of the V parameter-generating samples, we apply an algorithm that maps the sample in a target parameter mapping which represent the statistical target parameter generated by that parameter-generating …
Estimating Effects On Rare Outcomes: Knowledge Is Power, Laura B. Balzer, Mark J. Van Der Laan
Estimating Effects On Rare Outcomes: Knowledge Is Power, Laura B. Balzer, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Many of the secondary outcomes in observational studies and randomized trials are rare. Methods for estimating causal effects and associations with rare outcomes, however, are limited, and this represents a missed opportunity for investigation. In this article, we construct a new targeted minimum loss-based estimator (TMLE) for the effect of an exposure or treatment on a rare outcome. We focus on the causal risk difference and statistical models incorporating bounds on the conditional risk of the outcome, given the exposure and covariates. By construction, the proposed estimator constrains the predicted outcomes to respect this model knowledge. Theoretically, this bounding provides …
Assessing Association For Bivariate Survival Data With Interval Sampling: A Copula Model Approach With Application To Aids Study, Hong Zhu, Mei-Cheng Wang
Assessing Association For Bivariate Survival Data With Interval Sampling: A Copula Model Approach With Application To Aids Study, Hong Zhu, Mei-Cheng Wang
Johns Hopkins University, Dept. of Biostatistics Working Papers
In disease surveillance systems or registries, bivariate survival data are typically collected under interval sampling. It refers to a situation when entry into a registry is at the time of the first failure event (e.g., HIV infection) within a calendar time interval, the time of the initiating event (e.g., birth) is retrospectively identified for all the cases in the registry, and subsequently the second failure event (e.g., death) is observed during the follow-up. Sampling bias is induced due to the selection process that the data are collected conditioning on the first failure event occurs within a time interval. Consequently, the …
Effectively Selecting A Target Population For A Future Comparative Study, Lihui Zhao, Lu Tian, Tianxi Cai, Brian Claggett, L. J. Wei
Effectively Selecting A Target Population For A Future Comparative Study, Lihui Zhao, Lu Tian, Tianxi Cai, Brian Claggett, L. J. Wei
Harvard University Biostatistics Working Paper Series
When comparing a new treatment with a control in a randomized clinical study, the treatment effect is generally assessed by evaluating a summary measure over a specific study population. The success of the trial heavily depends on the choice of such a population. In this paper, we show a systematic, effective way to identify a promising population, for which the new treatment is expected to have a desired benefit, using the data from a current study involving similar comparator treatments. Specifically, with the existing data we first create a parametric scoring system using multiple covariates to estimate subject-specific treatment differences. …
On The Covariate-Adjusted Estimation For An Overall Treatment Difference With Data From A Randomized Comparative Clinical Trial, Lu Tian, Tianxi Cai, Lihui Zhao, L. J. Wei
On The Covariate-Adjusted Estimation For An Overall Treatment Difference With Data From A Randomized Comparative Clinical Trial, Lu Tian, Tianxi Cai, Lihui Zhao, L. J. Wei
Harvard University Biostatistics Working Paper Series
No abstract provided.
A Unified Approach To Non-Negative Matrix Factorization And Probabilistic Latent Semantic Indexing, Karthik Devarajan, Guoli Wang, Nader Ebrahimi
A Unified Approach To Non-Negative Matrix Factorization And Probabilistic Latent Semantic Indexing, Karthik Devarajan, Guoli Wang, Nader Ebrahimi
COBRA Preprint Series
Non-negative matrix factorization (NMF) by the multiplicative updates algorithm is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into two matrices, W and H, each with nonnegative entries, V ~ WH. NMF has been shown to have a unique parts-based, sparse representation of the data. The nonnegativity constraints in NMF allow only additive combinations of the data which enables it to learn parts that have distinct physical representations in reality. In the last few years, NMF has been successfully applied in a variety of areas such as natural language processing, information retrieval, image processing, speech recognition …
Estimating Subject-Specific Treatment Differences For Risk-Benefit Assessment With Competing Risk Event-Time Data, Brian Claggett, Lihui Zhao, Lu Tian, Davide Castagno, L. J. Wei
Estimating Subject-Specific Treatment Differences For Risk-Benefit Assessment With Competing Risk Event-Time Data, Brian Claggett, Lihui Zhao, Lu Tian, Davide Castagno, L. J. Wei
Harvard University Biostatistics Working Paper Series
No abstract provided.
Flipping The Winner Of A Poset Game, Adam O. Kalinich '12
Flipping The Winner Of A Poset Game, Adam O. Kalinich '12
Student Publications & Research
Partially-ordered set games, also called poset games, are a class of two-player combinatorial games. The playing field consists of a set of elements, some of which are greater than other elements. Two players take turns removing an element and all elements greater than it, and whoever takes the last element wins. Examples of poset games include Nim and Chomp. We investigate the complexity of computing which player of a poset game has a winning strategy. We give an inductive procedure that modifies poset games to change the nim-value which informally captures the winning strategies in the game. For a generic …
Minimum Description Length And Empirical Bayes Methods Of Identifying Snps Associated With Disease, Ye Yang, David R. Bickel
Minimum Description Length And Empirical Bayes Methods Of Identifying Snps Associated With Disease, Ye Yang, David R. Bickel
COBRA Preprint Series
The goal of determining which of hundreds of thousands of SNPs are associated with disease poses one of the most challenging multiple testing problems. Using the empirical Bayes approach, the local false discovery rate (LFDR) estimated using popular semiparametric models has enjoyed success in simultaneous inference. However, the estimated LFDR can be biased because the semiparametric approach tends to overestimate the proportion of the non-associated single nucleotide polymorphisms (SNPs). One of the negative consequences is that, like conventional p-values, such LFDR estimates cannot quantify the amount of information in the data that favors the null hypothesis of no disease-association.
We …
Stratifying Subjects For Treatment Selection With Censored Event Time Data From A Comparative Study, Lihui Zhao, Tianxi Cai, Lu Tian, Hajime Uno, Scott D. Solomon, L. J. Wei
Stratifying Subjects For Treatment Selection With Censored Event Time Data From A Comparative Study, Lihui Zhao, Tianxi Cai, Lu Tian, Hajime Uno, Scott D. Solomon, L. J. Wei
Harvard University Biostatistics Working Paper Series
No abstract provided.
Principled Sure Independence Screening For Cox Models With Ultra-High-Dimensional Covariates, Sihai Dave Zhao, Yi Li
Principled Sure Independence Screening For Cox Models With Ultra-High-Dimensional Covariates, Sihai Dave Zhao, Yi Li
Harvard University Biostatistics Working Paper Series
No abstract provided.