Applied Statistics Commons

Open Access. Powered by Scholars. Published by Universities.

30 Institutions 390 Full-Text Articles 412 Authors 85,002 Downloads

Recent Articles in Applied Statistics

Subsemble: An Ensemble Method For Combining Subset-Specific Algorithm Fits, Stephanie Sapp, Mark J. van der Laan, John Canny COBRA

Subsemble: An Ensemble Method For Combining Subset-Specific Algorithm Fits, Stephanie Sapp, Mark J. Van Der Laan, John Canny

U.C. Berkeley Division of Biostatistics Working Paper Series

Ensemble methods using the same underlying algorithm trained on different subsets of observations have recently received increased attention as practical prediction tools for massive datasets. We propose Subsemble: a general subset ensemble prediction method, which can be used for small, moderate, or large datasets. Subsemble partitions the full dataset into subsets of observations, fits a specified underlying algorithm on each subset, and uses a clever form of V-fold cross-validation to output a prediction function that combines the subset-specific fits. We give an oracle result that provides a theoretical performance guarantee for Subsemble. Through simulations, we demonstrate that Subsemble can be ...


Predicting Customer Satisfaction From Dental Implants Perception Data, Omnya Elmassad McMaster University

Predicting Customer Satisfaction From Dental Implants Perception Data, Omnya Elmassad

Open Access Dissertations and Theses

In recent years, measuring customer satisfaction has become one of the key concerns of market research studies. One of the basic features of leading companies is their success in fulfilling their customers’ demands. For that reason, companies attempt to find out what essential factors dominate their customers’ purchasing habits.

Millennium Research Group (MRG) - a global authority on medical tech- nology market intelligence - uses a web-based survey tool to collect informa- tion about customers’ level of satisfaction. One of their surveys is designed to gather information about the practitioner’s level of satisfaction on different brands of dental implants. The Dental ...


Assessment Of Tillage Practices Using Landsat-Tm 5 In Nebraska., Sonisa Sharma University of Nebraska - Lincoln

Assessment Of Tillage Practices Using Landsat-Tm 5 In Nebraska., Sonisa Sharma

Dissertations & Theses in Natural Resources

Tillage management practices are an important component to crop production and to federal and state conservation efforts and crop subsidy programs. Crop residue created by conservation tillage reduces soil erosion and reduce evaporation from exposed soil. Agro-hydrological models require information on tillage practices to estimate their impacts on soil-water-holding capacity, total evapotranspiration, carbon sequestration, water runoff and water and wind erosion for agricultural lands. Classification of tillage practices using remote sensing offers promise for the rapid collection of tillage information on individual fields over large areas. Using satellite imagery proves to be challenging due to the similarity in spectral signatures ...


From Unbiased Numerical Estimates To Unbiased Interval Estimates, Baokun Li, Gang Xiang, Vladik Kreinovich, Panagios Moscopoulos University of Texas at El Paso

From Unbiased Numerical Estimates To Unbiased Interval Estimates, Baokun Li, Gang Xiang, Vladik Kreinovich, Panagios Moscopoulos

Departmental Technical Reports (CS)

One of the main objectives of statistics is to estimate the parameters of a probability distribution based on a sample taken from this distribution. Of course, since the sample is finite, the estimate X is, in general, different from the actual value x of the corresponding parameter. What we can require is that the corresponding estimate is unbiased, i.e., that the mean value of the difference X - x is equal to 0: E[X] = x. In some problems, unbiased estimates are not possible. We show that in some such problems, it is possible to have interval unbiased estimates, i ...


Significant Themes In 19th-Century Literature, Matthew L. Jockers, David Mimno University of Nebraska - Lincoln

Significant Themes In 19th-Century Literature, Matthew L. Jockers, David Mimno

Faculty Publications -- Department of English

External factors such as author gender, author nationality, and date of publication affect both the choice of literary themes in novels and the expression of those themes, but the extent of this association is difficult to quantify. In this work, we apply statistical methods to identify and extract hundreds of "topics" from a corpus of 3,346 works of 19th-century British, Irish, and American fiction. We use these topics as a measurable, data-driven proxy for literary themes. External factors may predict fluctuations in the use of themes and the individual word choices within themes. We use topics to measure the ...


A Bayesian Regression Tree Approach To Identify The Effect Of Nanoparticles Properties On Toxicity Profiles, Cecile Low-Kam, Haiyuan Zhang, Zhaoxia Ji, Tian Xia, Jeffrey I. Zinc, Andre Nel, Donatello Telesca COBRA

A Bayesian Regression Tree Approach To Identify The Effect Of Nanoparticles Properties On Toxicity Profiles, Cecile Low-Kam, Haiyuan Zhang, Zhaoxia Ji, Tian Xia, Jeffrey I. Zinc, Andre Nel, Donatello Telesca

COBRA Preprint Series

We introduce a Bayesian multiple regression tree model to characterize relationships between physico-chemical properties of nanoparticles and their in-vitro toxicity over multiple doses and times of exposure. Unlike conventional models that rely on data summaries, our model solves the low sample size issue and avoids arbitrary loss of information by combining all measurements from a general exposure experiment across doses, times of exposure, and replicates. The proposed technique integrates Bayesian trees for modeling threshold effects and interactions, and penalized B-splines for dose and time-response surfaces smoothing. The resulting posterior distribution is sampled via a Markov Chain Monte Carlo algorithm. This ...


Do Non-Response Follow-Ups Improve Or Reduce Data Quality?: A Review Of The Existing Literature, Kristen Olson University of Nebraska - Lincoln

Do Non-Response Follow-Ups Improve Or Reduce Data Quality?: A Review Of The Existing Literature, Kristen Olson

Sociology Department, Faculty Publications

The paper systematically reviews existing literature on the relationship between the level of effort to recruit a sampled person and the measurement quality of survey data. Hypotheses proposed for this relationship are reviewed. Empirical findings for the relationship between level of effort as measured by paradata (the number of follow-up attempts, refusal conversion and time in the field) and question-specific item non-response rates, aggregate measures of item non-response rates, response accuracy and various measurement errors on attitudinal questions are examined through a qualitative review.


Missing At Random And Ignorability For Inferences About Subsets Of Parameters With Missing Data, Roderick J. Little, Sahar Zanganeh COBRA

Missing At Random And Ignorability For Inferences About Subsets Of Parameters With Missing Data, Roderick J. Little, Sahar Zanganeh

The University of Michigan Department of Biostatistics Working Paper Series

For likelihood-based inferences from data with missing values, Rubin (1976) showed that the missing data mechanism can be ignored when (a) the missing data are missing at random (MAR), in the sense that missingness does not depend on the missing values after conditioning on the observed data, and (b) the parameters of the data model and the missing-data mechanism are distinct; that is, there are no a priori ties, via parameter space restrictions or prior distributions, between the parameters of the data model and the parameters of the model for the mechanism. Rubin described (a) and (b) as the "weakest ...


Patron-Driven Acquisition And Circulation At An Academic Library: Interaction Effects And Circulation Performance Of Print Books Acquired Via Librarians’ Orders, Approval Plans, And Patrons’ Interlibrary Loan Requests, David Tyler, Christina D. Falci, Joyce C. Melvin, MaryLou Epp, Anita M. Kreps University of Nebraska - Lincoln

Patron-Driven Acquisition And Circulation At An Academic Library: Interaction Effects And Circulation Performance Of Print Books Acquired Via Librarians’ Orders, Approval Plans, And Patrons’ Interlibrary Loan Requests, David Tyler, Christina D. Falci, Joyce C. Melvin, Marylou Epp, Anita M. Kreps

Faculty Publications, UNL Libraries

Numerous publications on patron-driven acquisition (PDA) for print books and similar materials have reported that patron-requested materials circulate more. Tying circulation to selector may be failing to address the complex of factors that contributes to items’ circulation. In the present study, the authors revisit a PDA program’s data and to determine whether PDA print books’ circulation advantage persists when the potential interactions of several additional variables are taken into account. As with prior studies, library patrons were significantly better predictors of circulation than were librarians or approval plans. However, librarians proved to be significantly better predictors than were approval ...


Environmentally Friendly Sizing Agent From Corn Distillers Dried Grains, Yue Zhang University of Nebraska - Lincoln

Environmentally Friendly Sizing Agent From Corn Distillers Dried Grains, Yue Zhang

Open Access Theses and Dissertations from the College of Education and Human Sciences

Distillers dried grains (DDGS), the coproducts of corn ethanol production, were used as a textile sizing agent on cotton, polyester and polyester/cotton blends in an effort to find inexpensive and biodegradable alternatives to sizing agents such as poly(vinyl alcohol) that are currently used. Although DDGS is an inexpensive, biodegradable and abundant co-product, it has limited industrial applications. DDGS is a mixture of carbohydrates, proteins and oil which are used as sizing agents or as size additives. The effects of DDGS extraction conditions on sizing evaluation parameters such as fiber adhesion, film properties, viscosity and fabric abrasion were studied ...


Global Quantitative Assessment Of The Colorectal Polyp Burden In, Patrick M. Lynch, Jeffrey S. Morris, William A. Ross, Miguel A. Rodriguez-Bigas, Juan Posadas, Rossa Khalaf, Diane M. Weber, Valerie O. Sepeda, Bernard Levin, Imad Shureiqi The University of Texas

Global Quantitative Assessment Of The Colorectal Polyp Burden In, Patrick M. Lynch, Jeffrey S. Morris, William A. Ross, Miguel A. Rodriguez-Bigas, Juan Posadas, Rossa Khalaf, Diane M. Weber, Valerie O. Sepeda, Bernard Levin, Imad Shureiqi

Jeffrey S. Morris

Background: Accurate measures of the total polyp burden in familial adenomatous polyposis (FAP) are lacking. Current assessment tools include polyp quantitation in limited-field photographs and qualitative total colorectal polyp burden by video.

Objective: To develop global quantitative tools of the FAP colorectal adenoma burden.

Design: A single-arm, phase II trial.

Patients: Twenty-seven patients with FAP.

Intervention: Treatment with celecoxib for 6 months, with before-treatment and after-treatment videos posted to an intranet with an interactive site for scoring.

Main Outcome Measurements: Global adenoma counts and sizes (grouped into categories: less than 2 mm, 2-4 mm, and greater than 4 mm) were ...


Analysis Of Alcohol Use Among Pregnant Women In San Luis Obispo County, Samantha Law California Polytechnic State University

Analysis Of Alcohol Use Among Pregnant Women In San Luis Obispo County, Samantha Law

Statistics

Drinking alcohol during pregnancy is harmful to the fetus, and can lead to serious alcohol related developmental birth defects. Utilizing prenatal screening, such as the 4P’s Plus© screening tool, during a woman’s first prenatal doctors visit can help educate women and reduce continued alcohol use during pregnancy. Currently the CDC reports that 1 in 13 women in the US drink alcohol while pregnant compared to local reports that 1 in 3 women in San Luis Obispo County continue to drink alcohol during pregnancy. A primary concern for many local county health care experts and organizations is to raise ...


The Implementation Of The Shear Correlation Function And The Matter Power Spectrum In R, Allison A. Scheppelmann, Deborah J. Bard California Polytechnic State University

The Implementation Of The Shear Correlation Function And The Matter Power Spectrum In R, Allison A. Scheppelmann, Deborah J. Bard

STEM Teacher and Researcher (STAR) Program Posters

Weak gravitational lensing is an important tool in understanding the large-scale structure of the universe. One component in understanding the effect of weak gravitational lensing is the shear correlation function and matter power spectrum. The calculation of these values is often complicated and time consuming. In order to decrease the cost of these calculations they were implemented in R using parallelization. This resulted in the calculations completing faster and the process to be easily changed in order to fit the need of each researcher using the algorithms created in R.


Multitarget Tracking Using Multistatic Sensors, MAHESWARAN SUBRAMANIAM McMaster University

Multitarget Tracking Using Multistatic Sensors, Maheswaran Subramaniam

Open Access Dissertations and Theses

In this thesis the problem of multitarget tracking in multistatic sensor networks is studied. This thesis focuses on tracking airborne targets by utilizing transmitters of opportunity in the surveillance region. Passive Coherent Location (PCL) system, which uses existing commercial signals (e.g., FM broadcast, digital TV) as the illuminators of opportunity for target tracking, is an emerging technology in air defence systems. PCL systems have many advantages over conventional radar systems such as low cost, covert operation and low vulnerability to electronic counter measures.

One of another opportunistic signals available in the surveillance region is multipath signal. In this thesis ...


Robust Estimation Of Autoregressive Conditional Duration Models, Rola S. El Sebai McMaster University

Robust Estimation Of Autoregressive Conditional Duration Models, Rola S. El Sebai

Open Access Dissertations and Theses

In this thesis, we apply the Ordinary Least Squares (OLS) and the Generalized Least Squares (GLS) methods for the estimation of Autoregressive Conditional Duration (ACD) models, as opposed to the typical approach of using the Quasi Maximum Likelihood Estimation (QMLE).

The advantages of OLS and GLS as the underlying methods of estimation lie in their theoretical ease and computational convenience. The latter property is crucial for high frequency trading, where a transaction decision needs to be made within a minute. We show that both OLS and GLS estimates are asymptotically consistent and normally distributed. The normal approximation does not seem ...


Comparative Analysis Of Dispersion Parameter Estimates In Loglinear Modeling: Applied To E-Commerce Sales And Customer Data, Scott Davis California Polytechnic State University

Comparative Analysis Of Dispersion Parameter Estimates In Loglinear Modeling: Applied To E-Commerce Sales And Customer Data, Scott Davis

Statistics

When loglinear models are applied to count data the issue of over-dispersion often arises. Moment and maximum likelihood estimation methods in accounting for over-dispersion are widely used because they allow for model checking tools such as Chi-square, F, and likelihood ratio tests. Here is a comparison between R functions that each uses one method; glm.nb uses MLE, and glm.poisson.disp uses MME. The Index of Dissimilarity and visual model selection (ECDF plots) are also incorporated. These are applied to sales data using product and customer information compiled over the last five years that was generously provided by an ...


Analysis Of Dietary Patterns Over Freshman Year Of College, Chelsea Lofland California Polytechnic State University

Analysis Of Dietary Patterns Over Freshman Year Of College, Chelsea Lofland

Statistics

This analysis is an investigation of changes in Cal Poly students’ eating habits over freshman year. The motivation behind this was an interest in college students’ lifestyles; college is the first time most students live on their own and it can be an important maturation period. College is stressful, exciting, liberating, and terrifying all at the same time. This distinctive life experience, along with my desire to handle big and messy data, led me to this research question.

The response variable analyzed was food consumption and the explanatory variables were: sex, race, quarter, food group, stress, exercise, BMI, sleep quality ...


Sparse Principal Component Analysis For High-Dimensional Data: A Comparative Study, Ashley J. Bonner McMaster University

Sparse Principal Component Analysis For High-Dimensional Data: A Comparative Study, Ashley J. Bonner

Open Access Dissertations and Theses

Background: Through unprecedented advances in technology, high-dimensional datasets have exploded into many fields of observational research. For example, it is now common to expect thousands or millions of genetic variables (p) with only a limited number of study participants (n). Determining the important features proves statistically difficult, as multivariate analysis techniques become flooded and mathematically insufficient when n < p. Principal Component Analysis (PCA) is a commonly used multivariate method for dimension reduction and data visualization but suffers from these issues. A collection of Sparse PCA methods have been proposed to counter these flaws but have not been tested in comparative detail. Methods: Performances of three Sparse PCA methods were evaluated through simulations. Data was generated for 56 different data-structures, ranging p, the number of underlying groups and the variance structure within them. Estimation and interpretability of the principal components (PCs) were rigorously tested ...


Spatially Dense Drip Hydrological Monitoring And Infiltration Behaviour At The Wellington Caves, South East Australia, Catherine N. Jex, Gregoire Mariethoz, Andy Baker, Peter Graham, Martin S. Andersen, Ian Acworth, Nerilee Edwards, Cecilia Azcurra University of South Florida

Spatially Dense Drip Hydrological Monitoring And Infiltration Behaviour At The Wellington Caves, South East Australia, Catherine N. Jex, Gregoire Mariethoz, Andy Baker, Peter Graham, Martin S. Andersen, Ian Acworth, Nerilee Edwards, Cecilia Azcurra

International Journal of Speleology

Despite the fact that karst regions are recognised as significant groundwater resources, the nature of groundwater flow paths in the unsaturated zone of such fractured rock is at present poorly understood. Many traditional methods for constraining groundwater flow regimes in karst aquifers are focussed on the faster drainage components and are unable to inform on the smaller fracture or matrix-flow components of the system. Caves however, offer a natural inception point to observe both the long term storage and the preferential movement of water through the unsaturated zone of such fractured carbonate rock by monitoring of drip rates of stalactites ...


Improvement Of Statistical Process Control At St. Jude Medical's Cardiac Manufacturing Facility, Christopher Lance Edwards California Polytechnic State University

Improvement Of Statistical Process Control At St. Jude Medical's Cardiac Manufacturing Facility, Christopher Lance Edwards

Master's Theses and Project Reports

Sig sigma is a methodology where companies strive to reproduce results ending up having a 99.9996% chance their product will be void of defects. In order for companies to reach six sigma, statistical process control (SPC) needs to be introduced. SPC has many different tools associated with it, control charts being one of them. Control charts play a vital role in managing how a process is behaving. Control charts allow users to identify special causes, or shifts, and can therefore change the process to keep producing good products, free of defects.

There are many factories and manufacturing facilities having ...