30 Institutions 390 Full-Text Articles 412 Authors 85,002 Downloads
Recent Articles in Applied Statistics
Subsemble: An Ensemble Method For Combining Subset-Specific Algorithm Fits, Stephanie Sapp, Mark J. van der Laan, John Canny COBRA
Subsemble: An Ensemble Method For Combining Subset-Specific Algorithm Fits, Stephanie Sapp, Mark J. Van Der Laan, John Canny
U.C. Berkeley Division of Biostatistics Working Paper Series
Ensemble methods using the same underlying algorithm trained on different subsets of observations have recently received increased attention as practical prediction tools for massive datasets. We propose Subsemble: a general subset ensemble prediction method, which can be used for small, moderate, or large datasets. Subsemble partitions the full dataset into subsets of observations, fits a specified underlying algorithm on each subset, and uses a clever form of V-fold cross-validation to output a prediction function that combines the subset-specific fits. We give an oracle result that provides a theoretical performance guarantee for Subsemble. Through simulations, we demonstrate that Subsemble can be ...
Predicting Customer Satisfaction From Dental Implants Perception Data, Omnya Elmassad McMaster University
Predicting Customer Satisfaction From Dental Implants Perception Data, Omnya Elmassad
Open Access Dissertations and Theses
In recent years, measuring customer satisfaction has become one of the key concerns of market research studies. One of the basic features of leading companies is their success in fulfilling their customers’ demands. For that reason, companies attempt to find out what essential factors dominate their customers’ purchasing habits.
Millennium Research Group (MRG) - a global authority on medical tech- nology market intelligence - uses a web-based survey tool to collect informa- tion about customers’ level of satisfaction. One of their surveys is designed to gather information about the practitioner’s level of satisfaction on different brands of dental implants. The Dental ...
Assessment Of Tillage Practices Using Landsat-Tm 5 In Nebraska., Sonisa Sharma University of Nebraska - Lincoln
Assessment Of Tillage Practices Using Landsat-Tm 5 In Nebraska., Sonisa Sharma
Dissertations & Theses in Natural Resources
Tillage management practices are an important component to crop production and to federal and state conservation efforts and crop subsidy programs. Crop residue created by conservation tillage reduces soil erosion and reduce evaporation from exposed soil. Agro-hydrological models require information on tillage practices to estimate their impacts on soil-water-holding capacity, total evapotranspiration, carbon sequestration, water runoff and water and wind erosion for agricultural lands. Classification of tillage practices using remote sensing offers promise for the rapid collection of tillage information on individual fields over large areas. Using satellite imagery proves to be challenging due to the similarity in spectral signatures ...
From Unbiased Numerical Estimates To Unbiased Interval Estimates, Baokun Li, Gang Xiang, Vladik Kreinovich, Panagios Moscopoulos University of Texas at El Paso
From Unbiased Numerical Estimates To Unbiased Interval Estimates, Baokun Li, Gang Xiang, Vladik Kreinovich, Panagios Moscopoulos
Departmental Technical Reports (CS)
One of the main objectives of statistics is to estimate the parameters of a probability distribution based on a sample taken from this distribution. Of course, since the sample is finite, the estimate X is, in general, different from the actual value x of the corresponding parameter. What we can require is that the corresponding estimate is unbiased, i.e., that the mean value of the difference X - x is equal to 0: E[X] = x. In some problems, unbiased estimates are not possible. We show that in some such problems, it is possible to have interval unbiased estimates, i ...
Significant Themes In 19th-Century Literature, Matthew L. Jockers, David Mimno University of Nebraska - Lincoln
Significant Themes In 19th-Century Literature, Matthew L. Jockers, David Mimno
Faculty Publications -- Department of English
External factors such as author gender, author nationality, and date of publication affect both the choice of literary themes in novels and the expression of those themes, but the extent of this association is difficult to quantify. In this work, we apply statistical methods to identify and extract hundreds of "topics" from a corpus of 3,346 works of 19th-century British, Irish, and American fiction. We use these topics as a measurable, data-driven proxy for literary themes. External factors may predict fluctuations in the use of themes and the individual word choices within themes. We use topics to measure the ...
A Bayesian Regression Tree Approach To Identify The Effect Of Nanoparticles Properties On Toxicity Profiles, Cecile Low-Kam, Haiyuan Zhang, Zhaoxia Ji, Tian Xia, Jeffrey I. Zinc, Andre Nel, Donatello Telesca COBRA
A Bayesian Regression Tree Approach To Identify The Effect Of Nanoparticles Properties On Toxicity Profiles, Cecile Low-Kam, Haiyuan Zhang, Zhaoxia Ji, Tian Xia, Jeffrey I. Zinc, Andre Nel, Donatello Telesca
COBRA Preprint Series
We introduce a Bayesian multiple regression tree model to characterize relationships between physico-chemical properties of nanoparticles and their in-vitro toxicity over multiple doses and times of exposure. Unlike conventional models that rely on data summaries, our model solves the low sample size issue and avoids arbitrary loss of information by combining all measurements from a general exposure experiment across doses, times of exposure, and replicates. The proposed technique integrates Bayesian trees for modeling threshold effects and interactions, and penalized B-splines for dose and time-response surfaces smoothing. The resulting posterior distribution is sampled via a Markov Chain Monte Carlo algorithm. This ...
Do Non-Response Follow-Ups Improve Or Reduce Data Quality?: A Review Of The Existing Literature, Kristen Olson University of Nebraska - Lincoln
Do Non-Response Follow-Ups Improve Or Reduce Data Quality?: A Review Of The Existing Literature, Kristen Olson
Sociology Department, Faculty Publications
The paper systematically reviews existing literature on the relationship between the level of effort to recruit a sampled person and the measurement quality of survey data. Hypotheses proposed for this relationship are reviewed. Empirical findings for the relationship between level of effort as measured by paradata (the number of follow-up attempts, refusal conversion and time in the field) and question-specific item non-response rates, aggregate measures of item non-response rates, response accuracy and various measurement errors on attitudinal questions are examined through a qualitative review.
Missing At Random And Ignorability For Inferences About Subsets Of Parameters With Missing Data, Roderick J. Little, Sahar Zanganeh COBRA
Missing At Random And Ignorability For Inferences About Subsets Of Parameters With Missing Data, Roderick J. Little, Sahar Zanganeh
The University of Michigan Department of Biostatistics Working Paper Series
For likelihood-based inferences from data with missing values, Rubin (1976) showed that the missing data mechanism can be ignored when (a) the missing data are missing at random (MAR), in the sense that missingness does not depend on the missing values after conditioning on the observed data, and (b) the parameters of the data model and the missing-data mechanism are distinct; that is, there are no a priori ties, via parameter space restrictions or prior distributions, between the parameters of the data model and the parameters of the model for the mechanism. Rubin described (a) and (b) as the "weakest ...
Patron-Driven Acquisition And Circulation At An Academic Library: Interaction Effects And Circulation Performance Of Print Books Acquired Via Librarians’ Orders, Approval Plans, And Patrons’ Interlibrary Loan Requests, David Tyler, Christina D. Falci, Joyce C. Melvin, MaryLou Epp, Anita M. Kreps University of Nebraska - Lincoln
Patron-Driven Acquisition And Circulation At An Academic Library: Interaction Effects And Circulation Performance Of Print Books Acquired Via Librarians’ Orders, Approval Plans, And Patrons’ Interlibrary Loan Requests, David Tyler, Christina D. Falci, Joyce C. Melvin, Marylou Epp, Anita M. Kreps
Faculty Publications, UNL Libraries
Numerous publications on patron-driven acquisition (PDA) for print books and similar materials have reported that patron-requested materials circulate more. Tying circulation to selector may be failing to address the complex of factors that contributes to items’ circulation. In the present study, the authors revisit a PDA program’s data and to determine whether PDA print books’ circulation advantage persists when the potential interactions of several additional variables are taken into account. As with prior studies, library patrons were significantly better predictors of circulation than were librarians or approval plans. However, librarians proved to be significantly better predictors than were approval ...
Environmentally Friendly Sizing Agent From Corn Distillers Dried Grains, Yue Zhang University of Nebraska - Lincoln
Environmentally Friendly Sizing Agent From Corn Distillers Dried Grains, Yue Zhang
Open Access Theses and Dissertations from the College of Education and Human Sciences
Distillers dried grains (DDGS), the coproducts of corn ethanol production, were used as a textile sizing agent on cotton, polyester and polyester/cotton blends in an effort to find inexpensive and biodegradable alternatives to sizing agents such as poly(vinyl alcohol) that are currently used. Although DDGS is an inexpensive, biodegradable and abundant co-product, it has limited industrial applications. DDGS is a mixture of carbohydrates, proteins and oil which are used as sizing agents or as size additives. The effects of DDGS extraction conditions on sizing evaluation parameters such as fiber adhesion, film properties, viscosity and fabric abrasion were studied ...
Global Quantitative Assessment Of The Colorectal Polyp Burden In, Patrick M. Lynch, Jeffrey S. Morris, William A. Ross, Miguel A. Rodriguez-Bigas, Juan Posadas, Rossa Khalaf, Diane M. Weber, Valerie O. Sepeda, Bernard Levin, Imad Shureiqi The University of Texas
Global Quantitative Assessment Of The Colorectal Polyp Burden In, Patrick M. Lynch, Jeffrey S. Morris, William A. Ross, Miguel A. Rodriguez-Bigas, Juan Posadas, Rossa Khalaf, Diane M. Weber, Valerie O. Sepeda, Bernard Levin, Imad Shureiqi
Jeffrey S. Morris
Background: Accurate measures of the total polyp burden in familial adenomatous polyposis (FAP) are lacking. Current assessment tools include polyp quantitation in limited-field photographs and qualitative total colorectal polyp burden by video.
Objective: To develop global quantitative tools of the FAP colorectal adenoma burden.
Design: A single-arm, phase II trial.
Patients: Twenty-seven patients with FAP.
Intervention: Treatment with celecoxib for 6 months, with before-treatment and after-treatment videos posted to an intranet with an interactive site for scoring.
Main Outcome Measurements: Global adenoma counts and sizes (grouped into categories: less than 2 mm, 2-4 mm, and greater than 4 mm) were ...
Analysis Of Alcohol Use Among Pregnant Women In San Luis Obispo County, Samantha Law California Polytechnic State University
Analysis Of Alcohol Use Among Pregnant Women In San Luis Obispo County, Samantha Law
Drinking alcohol during pregnancy is harmful to the fetus, and can lead to serious alcohol related developmental birth defects. Utilizing prenatal screening, such as the 4P’s Plus© screening tool, during a woman’s first prenatal doctors visit can help educate women and reduce continued alcohol use during pregnancy. Currently the CDC reports that 1 in 13 women in the US drink alcohol while pregnant compared to local reports that 1 in 3 women in San Luis Obispo County continue to drink alcohol during pregnancy. A primary concern for many local county health care experts and organizations is to raise ...
The Implementation Of The Shear Correlation Function And The Matter Power Spectrum In R, Allison A. Scheppelmann, Deborah J. Bard California Polytechnic State University
The Implementation Of The Shear Correlation Function And The Matter Power Spectrum In R, Allison A. Scheppelmann, Deborah J. Bard
STEM Teacher and Researcher (STAR) Program Posters
Weak gravitational lensing is an important tool in understanding the large-scale structure of the universe. One component in understanding the effect of weak gravitational lensing is the shear correlation function and matter power spectrum. The calculation of these values is often complicated and time consuming. In order to decrease the cost of these calculations they were implemented in R using parallelization. This resulted in the calculations completing faster and the process to be easily changed in order to fit the need of each researcher using the algorithms created in R.
Multitarget Tracking Using Multistatic Sensors, MAHESWARAN SUBRAMANIAM McMaster University
Multitarget Tracking Using Multistatic Sensors, Maheswaran Subramaniam
Open Access Dissertations and Theses
In this thesis the problem of multitarget tracking in multistatic sensor networks is studied. This thesis focuses on tracking airborne targets by utilizing transmitters of opportunity in the surveillance region. Passive Coherent Location (PCL) system, which uses existing commercial signals (e.g., FM broadcast, digital TV) as the illuminators of opportunity for target tracking, is an emerging technology in air defence systems. PCL systems have many advantages over conventional radar systems such as low cost, covert operation and low vulnerability to electronic counter measures.
One of another opportunistic signals available in the surveillance region is multipath signal. In this thesis ...
Robust Estimation Of Autoregressive Conditional Duration Models, Rola S. El Sebai McMaster University
Robust Estimation Of Autoregressive Conditional Duration Models, Rola S. El Sebai
Open Access Dissertations and Theses
In this thesis, we apply the Ordinary Least Squares (OLS) and the Generalized Least Squares (GLS) methods for the estimation of Autoregressive Conditional Duration (ACD) models, as opposed to the typical approach of using the Quasi Maximum Likelihood Estimation (QMLE).
The advantages of OLS and GLS as the underlying methods of estimation lie in their theoretical ease and computational convenience. The latter property is crucial for high frequency trading, where a transaction decision needs to be made within a minute. We show that both OLS and GLS estimates are asymptotically consistent and normally distributed. The normal approximation does not seem ...
Comparative Analysis Of Dispersion Parameter Estimates In Loglinear Modeling: Applied To E-Commerce Sales And Customer Data, Scott Davis California Polytechnic State University
Comparative Analysis Of Dispersion Parameter Estimates In Loglinear Modeling: Applied To E-Commerce Sales And Customer Data, Scott Davis
When loglinear models are applied to count data the issue of over-dispersion often arises. Moment and maximum likelihood estimation methods in accounting for over-dispersion are widely used because they allow for model checking tools such as Chi-square, F, and likelihood ratio tests. Here is a comparison between R functions that each uses one method; glm.nb uses MLE, and glm.poisson.disp uses MME. The Index of Dissimilarity and visual model selection (ECDF plots) are also incorporated. These are applied to sales data using product and customer information compiled over the last five years that was generously provided by an ...
Analysis Of Dietary Patterns Over Freshman Year Of College, Chelsea Lofland California Polytechnic State University
Analysis Of Dietary Patterns Over Freshman Year Of College, Chelsea Lofland
This analysis is an investigation of changes in Cal Poly students’ eating habits over freshman year. The motivation behind this was an interest in college students’ lifestyles; college is the first time most students live on their own and it can be an important maturation period. College is stressful, exciting, liberating, and terrifying all at the same time. This distinctive life experience, along with my desire to handle big and messy data, led me to this research question.
The response variable analyzed was food consumption and the explanatory variables were: sex, race, quarter, food group, stress, exercise, BMI, sleep quality ...
Sparse Principal Component Analysis For High-Dimensional Data: A Comparative Study, Ashley J. Bonner McMaster University
Sparse Principal Component Analysis For High-Dimensional Data: A Comparative Study, Ashley J. Bonner
Open Access Dissertations and Theses
Background: Through unprecedented advances in technology, high-dimensional datasets have exploded into many fields of observational research. For example, it is now common to expect thousands or millions of genetic variables (p) with only a limited number of study participants (n). Determining the important features proves statistically difficult, as multivariate analysis techniques become flooded and mathematically insufficient when n < p. Principal Component Analysis (PCA) is a commonly used multivariate method for dimension reduction and data visualization but suffers from these issues. A collection of Sparse PCA methods have been proposed to counter these flaws but have not been tested in comparative detail. Methods: Performances of three Sparse PCA methods were evaluated through simulations. Data was generated for 56 different data-structures, ranging p, the number of underlying groups and the variance structure within them. Estimation and interpretability of the principal components (PCs) were rigorously tested ...
Spatially Dense Drip Hydrological Monitoring And Infiltration Behaviour At The Wellington Caves, South East Australia, Catherine N. Jex, Gregoire Mariethoz, Andy Baker, Peter Graham, Martin S. Andersen, Ian Acworth, Nerilee Edwards, Cecilia Azcurra University of South Florida
Spatially Dense Drip Hydrological Monitoring And Infiltration Behaviour At The Wellington Caves, South East Australia, Catherine N. Jex, Gregoire Mariethoz, Andy Baker, Peter Graham, Martin S. Andersen, Ian Acworth, Nerilee Edwards, Cecilia Azcurra
International Journal of Speleology
Despite the fact that karst regions are recognised as significant groundwater resources, the nature of groundwater flow paths in the unsaturated zone of such fractured rock is at present poorly understood. Many traditional methods for constraining groundwater flow regimes in karst aquifers are focussed on the faster drainage components and are unable to inform on the smaller fracture or matrix-flow components of the system. Caves however, offer a natural inception point to observe both the long term storage and the preferential movement of water through the unsaturated zone of such fractured carbonate rock by monitoring of drip rates of stalactites ...
Improvement Of Statistical Process Control At St. Jude Medical's Cardiac Manufacturing Facility, Christopher Lance Edwards California Polytechnic State University
Improvement Of Statistical Process Control At St. Jude Medical's Cardiac Manufacturing Facility, Christopher Lance Edwards
Master's Theses and Project Reports
Sig sigma is a methodology where companies strive to reproduce results ending up having a 99.9996% chance their product will be void of defects. In order for companies to reach six sigma, statistical process control (SPC) needs to be introduced. SPC has many different tools associated with it, control charts being one of them. Control charts play a vital role in managing how a process is behaving. Control charts allow users to identify special causes, or shifts, and can therefore change the process to keep producing good products, free of defects.
There are many factories and manufacturing facilities having ...
Based on downloads this month
How Do You Interpret A Confidence Interval?, Paul Savory
Why Divide By (N-1) For Sample Standard Deviation?, Paul Savory
Larger Board Size And Decreasing Firm Value In Small Firms, Martin Wells, Theodore Eisenberg
Statistical Analysis Of Texas Holdem Poker, Daniel Bragonier
Link Spamming Wikipedia For Profit, Insup Lee, Oleg Sokolsky, Krishna Venkatasubramanian, Jian Chang, Andrew West
Judicial Politics, Death Penalty Appeals, And Case Selection: An Empirical Study, Theodore Eisenberg, John Blume
Based on downloads this month