Open Access. Powered by Scholars. Published by Universities.®

2018

Discipline
Institution
Keyword
Publication
Publication Type

Articles 1 - 26 of 26

Full-Text Articles in Other Statistics and Probability

Rfviz: An Interactive Visualization Package For Random Forests In R, Christopher Beckett Dec 2018

Rfviz: An Interactive Visualization Package For Random Forests In R, Christopher Beckett

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Random forests are very popular tools for predictive analysis and data science. They work for both classification (where there is a categorical response variable) and regression (where the response is continuous). Random forests provide proximities, and both local and global measures of variable importance. However, these quantities require special tools to be effectively used to interpret the forest. Rfviz is a sophisticated interactive visualization package and toolkit in R, specially designed for interpreting the results of a random forest in a user-friendly way. Rfviz uses a recently developed R package (loon) from the Comprehensive R Archive Network (CRAN) to create …


Season-Ahead Forecasting Of Water Storage And Irrigation Requirements – An Application To The Southwest Monsoon In India, Arun Ravindranath, Naresh Devineni, Upmanu Lall, Paulina Concha Larrauri Oct 2018

Season-Ahead Forecasting Of Water Storage And Irrigation Requirements – An Application To The Southwest Monsoon In India, Arun Ravindranath, Naresh Devineni, Upmanu Lall, Paulina Concha Larrauri

Publications and Research

Water risk management is a ubiquitous challenge faced by stakeholders in the water or agricultural sector. We present a methodological framework for forecasting water storage requirements and present an application of this methodology to risk assessment in India. The application focused on forecasting crop water stress for potatoes grown during the monsoon season in the Satara district of Maharashtra. Pre-season large-scale climate predictors used to forecast water stress were selected based on an exhaustive search method that evaluates for highest ranked probability skill score and lowest root-mean-squared error in a leave-one-out cross-validation mode. Adaptive forecasts were made in the years …


Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels Aug 2018

Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels

SMU Data Science Review

In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that reviews …


Quantitative Jeopardy Feud, Jonathan M. Gallimore Aug 2018

Quantitative Jeopardy Feud, Jonathan M. Gallimore

MSF 600 PR - Gallimore - Fall 2018

This activity - Quantitative Jeopardy Feud - is a method for using a game as a final exam.


Secondary Data Analysis Project, Jonathan M. Gallimore Aug 2018

Secondary Data Analysis Project, Jonathan M. Gallimore

SF 420 PR - Gallimore - Fall 2018

This activity is designed to give students an opportunity to apply what they have learned in statistics to a real dataset.

This activity will help students apply what they have learned in statistics to real world data and answer their own research questions. Students will also practice reporting their results in a paper using APA format.


Goalie Analytics: Statistical Evaluation Of Context-Specific Goalie Performance Measures In The National Hockey League, Marc Naples, Logan Gage, Amy Nussbaum Jul 2018

Goalie Analytics: Statistical Evaluation Of Context-Specific Goalie Performance Measures In The National Hockey League, Marc Naples, Logan Gage, Amy Nussbaum

SMU Data Science Review

In this paper, we attempt to improve upon the classic formulation of save percentage in the NHL by controlling the context of the shots and use alternative measures than save percentage. In particular, we find save percentage to be both a weakly repeatable skill and predictor of future performance, and we seek other goalie performance calculations that are more robust. To do so, we use three primary tests to test intra-season consistency, intra-season predictability, and inter-season consistency, and extend the analysis to disentangle team effects on goalie statistics. We find that there are multiple ways to improve upon classic save …


Evaluation Of Using The Bootstrap Procedure To Estimate The Population Variance, Nghia Trong Nguyen May 2018

Evaluation Of Using The Bootstrap Procedure To Estimate The Population Variance, Nghia Trong Nguyen

Electronic Theses and Dissertations

The bootstrap procedure is widely used in nonparametric statistics to generate an empirical sampling distribution from a given sample data set for a statistic of interest. Generally, the results are good for location parameters such as population mean, median, and even for estimating a population correlation. However, the results for a population variance, which is a spread parameter, are not as good due to the resampling nature of the bootstrap method. Bootstrap samples are constructed using sampling with replacement; consequently, groups of observations with zero variance manifest in these samples. As a result, a bootstrap variance estimator will carry a …


Standard And Anomalous Wave Transport Inside Random Media, Xujun Ma May 2018

Standard And Anomalous Wave Transport Inside Random Media, Xujun Ma

Dissertations, Theses, and Capstone Projects

This thesis is a study of wave transport inside random media using random matrix theory. Anderson localization plays a central role in wave transport in random media. As a consequence of destructive interference in multiple scattering, the wave function decays exponentially inside random systems. Anderson localization is a wave effect that applies to both classical waves and quantum waves. Random matrix theory has been successfully applied to study the statistical properties of transport and localization of waves. Particularly, the solution of the Dorokhov-Mello-Pereyra-Kumar (DMPK) equation gives the distribution of transmission.

For wave transport in standard one dimensional random systems in …


Initial Evidence Of Construct Validity Of Data From A Self-Assessment Instrument Of Technological Pedagogical Content Knowledge (Tpack) In 2-Year Public College Faculty In Texas, Kristin C. Scott Apr 2018

Initial Evidence Of Construct Validity Of Data From A Self-Assessment Instrument Of Technological Pedagogical Content Knowledge (Tpack) In 2-Year Public College Faculty In Texas, Kristin C. Scott

Human Resource Development Theses and Dissertations

Technological pedagogical content knowledge (TPACK) has been studied in K-12 faculty in the U.S. and around the world using survey methodology. Very few studies of TPACK in post-secondary faculty have been conducted and no peer-reviewed studies in U.S. post-secondary faculty have been published to date. The present study is the first reliability and validity of data from a TPACK survey to be conducted with a large sample of U.S. post-secondary faculty. The professorate of 2-year public college faculty in Texas will help their institutions meet the goals of the state’s higher education strategic plan, 60x30TX. In order to do …


Waste Management By Waste: Removal Of Acid Dyes From Wastewaters Of Textile Coloration Using Fish Scales, S M Fijul Kabir Apr 2018

Waste Management By Waste: Removal Of Acid Dyes From Wastewaters Of Textile Coloration Using Fish Scales, S M Fijul Kabir

LSU Master's Theses

Removal of hazardous acid dyes by economical process using low-cost bio-sorbents from wool industry wastewaters is of a pressing need, since it causes skin and respiratory diseases and disrupts other environmental components. Fish scales (FS), a by-product of fish industry, a type of solid waste, are usually discarded carelessly resulting in pungent odor and environmental burden. In this research, the FS of black drum (Pogonias cromis) were used for the removal of acid dyes (acid red 1 (AR1), acid blue 45 (AB45) and acid yellow 127 (AY126)) from wool industry wastewaters by absorption process with a view to …


On Some Ridge Regression Estimators For Logistic Regression Models, Ulyana P. Williams Mar 2018

On Some Ridge Regression Estimators For Logistic Regression Models, Ulyana P. Williams

FIU Electronic Theses and Dissertations

The purpose of this research is to investigate the performance of some ridge regression estimators for the logistic regression model in the presence of moderate to high correlation among the explanatory variables. As a performance criterion, we use the mean square error (MSE), the mean absolute percentage error (MAPE), the magnitude of bias, and the percentage of times the ridge regression estimator produces a higher MSE than the maximum likelihood estimator. A Monto Carlo simulation study has been executed to compare the performance of the ridge regression estimators under different experimental conditions. The degree of correlation, sample size, number of …


On The Performance Of Some Poisson Ridge Regression Estimators, Cynthia Zaldivar Mar 2018

On The Performance Of Some Poisson Ridge Regression Estimators, Cynthia Zaldivar

FIU Electronic Theses and Dissertations

Multiple regression models play an important role in analyzing and making predictions about data. Prediction accuracy becomes lower when two or more explanatory variables in the model are highly correlated. One solution is to use ridge regression. The purpose of this thesis is to study the performance of available ridge regression estimators for Poisson regression models in the presence of moderately to highly correlated variables. As performance criteria, we use mean square error (MSE), mean absolute percentage error (MAPE), and percentage of times the maximum likelihood (ML) estimator produces a higher MSE than the ridge regression estimator. A Monte Carlo …


Advances In Semi-Nonparametric Density Estimation And Shrinkage Regression, Hossein Zareamoghaddam Mar 2018

Advances In Semi-Nonparametric Density Estimation And Shrinkage Regression, Hossein Zareamoghaddam

Electronic Thesis and Dissertation Repository

This thesis advocates the use of shrinkage and penalty techniques for estimating the parameters of a regression model that comprises both parametric and nonparametric components and develops semi-nonparametric density estimation methodologies that are applicable in a regression context.

First, a moment-based approach whereby a univariate or bivariate density function is approximated by means of a suitable initial density function that is adjusted by a linear combination of orthogonal polynomials is introduced. Such adjustments are shown to be mathematically equivalent to making use of standard polynomials in one or two variables. Once extended to apply to density estimation, in which case …


Building A Better Risk Prevention Model, Steven Hornyak Mar 2018

Building A Better Risk Prevention Model, Steven Hornyak

National Youth Advocacy and Resilience Conference

This presentation chronicles the work of Houston County Schools in developing a risk prevention model built on more than ten years of longitudinal student data. In its second year of implementation, Houston At-Risk Profiles (HARP), has proven effective in identifying those students most in need of support and linking them to interventions and supports that lead to improved outcomes and significantly reduces the risk of failure.


Predicting The Next Us President By Simulating The Electoral College, Boyan Kostadinov Jan 2018

Predicting The Next Us President By Simulating The Electoral College, Boyan Kostadinov

Publications and Research

We develop a simulation model for predicting the outcome of the US Presidential election based on simulating the distribution of the Electoral College. The simulation model has two parts: (a) estimating the probabilities for a given candidate to win each state and DC, based on state polls, and (b) estimating the probability that a given candidate will win at least 270 electoral votes, and thus win the White House. All simulations are coded using the high-level, open-source programming language R. One of the goals of this paper is to promote computational thinking in any STEM field by illustrating how probabilistic …


Queues With Server Utilization Of One, Robert Aidoo Jan 2018

Queues With Server Utilization Of One, Robert Aidoo

Major Papers

In most queueing systems of type GI/G/1, the stability condition requires that the server utilization be strictly less than 1. The standard exception is a D/D/1 system in which stability still holds for server utilization equal to 1. This paper presents other cases when server utilization can equal 1, and discusses their characteristics.


Sequential Probing With A Random Start, Joshua Miller Jan 2018

Sequential Probing With A Random Start, Joshua Miller

HMC Senior Theses

Processing user requests quickly requires not only fast servers, but also demands methods to quickly locate idle servers to process those requests. Methods of finding idle servers are analogous to open addressing in hash tables, but with the key difference that servers may return to an idle state after having been busy rather than staying busy. Probing sequences for open addressing are well-studied, but algorithms for locating idle servers are less understood. We investigate sequential probing with a random start as a method for finding idle servers, especially in cases of heavy traffic. We present a procedure for finding the …


Existing And Potential Statistical And Computational Approaches For The Analysis Of 3d Ct Images Of Plant Roots, Zheng Xu, Camilo Valdes, Jennifer Clarke Jan 2018

Existing And Potential Statistical And Computational Approaches For The Analysis Of 3d Ct Images Of Plant Roots, Zheng Xu, Camilo Valdes, Jennifer Clarke

Department of Statistics: Faculty Publications

Scanning technologies based on X-ray Computed Tomography (CT) have been widely used in many scientific fields including medicine, nanosciences and materials research. Considerable progress in recent years has been made in agronomic and plant science research thanks to X-ray CT technology. X-ray CT image-based phenotyping methods enable high-throughput and non-destructive measuring and inference of root systems, which makes downstream studies of complex mechanisms of plants during growth feasible. An impressive amount of plant CT scanning data has been collected, but how to analyze these data efficiently and accurately remains a challenge. We review statistical and computational approaches that have been …


Characterization Of Soybean Protein Adhesives Modified By Xanthan Gum, Chen Feng, Fang Wang, Zheng Xu, Huilin Sui, Yong Fang, Xiaozhi Tang, Xinchun Shen Jan 2018

Characterization Of Soybean Protein Adhesives Modified By Xanthan Gum, Chen Feng, Fang Wang, Zheng Xu, Huilin Sui, Yong Fang, Xiaozhi Tang, Xinchun Shen

Department of Statistics: Faculty Publications

The aim of this study was to provide a basis for the preparation of medical adhesives from soybean protein sources. Soybean protein (SP) adhesives mixed with different concentrations of xanthan gum (XG) were prepared. Their adhesive features were evaluated by physicochemical parameters and an in vitro bone adhesion assay. The results showed that the maximal adhesion strength was achieved in 5% SP adhesive with 0.5% XG addition, which was 2.6-fold higher than the SP alone. The addition of XG significantly increased the hydrogen bond and viscosity, as well as increased the β-sheet content but decreased the α-helix content in the …


Development Of 11-Plex Mol-Pcr Assay For The Rapid Screening Of Samples For Shiga Toxin-Producing Escherichia Coli, Travis A. Woods, Heather M. Mendez, Sandy Ortega, Xiaorong Shi, David Marx, Jianfa Bai, Rodney A. Moxley, T. G. Nagaraja, Steven W. Graves, Alina Deshpande Jan 2018

Development Of 11-Plex Mol-Pcr Assay For The Rapid Screening Of Samples For Shiga Toxin-Producing Escherichia Coli, Travis A. Woods, Heather M. Mendez, Sandy Ortega, Xiaorong Shi, David Marx, Jianfa Bai, Rodney A. Moxley, T. G. Nagaraja, Steven W. Graves, Alina Deshpande

Department of Statistics: Faculty Publications

Strains of Shiga toxin-producing Escherichia coli (STEC) are a serious threat to the health, with approximately half of the STEC related food-borne illnesses attributable to contaminated beef. We developed an assay that was able to screen samples for several important STEC associated serogroups (O26, O45, O103, O104, O111, O121, O145, O157) and three major virulence factors (eae, stx1, stx2) in a rapid and multiplexed format using the Multiplex oligonucleotide ligation-PCR (MOL-PCR) assay chemistry. This assay detected unique STEC DNA signatures and is meant to be used on samples from various sources related to beef production, providing a multiplex and high-throughput …


Application Of Transfer Learning For Cancer Drug Sensitivity Prediction, Saugato Rahman Dhruba, Raziur Rahman, Kevin Matlock, Souparno Ghosh, Ranadip Pal Jan 2018

Application Of Transfer Learning For Cancer Drug Sensitivity Prediction, Saugato Rahman Dhruba, Raziur Rahman, Kevin Matlock, Souparno Ghosh, Ranadip Pal

Department of Statistics: Faculty Publications

Background: In precision medicine, scarcity of suitable biological data often hinders the design of an appropriate predictive model. In this regard, large scale pharmacogenomics studies, like CCLE and GDSC hold the promise to mitigate the issue. However, one cannot directly employ data from multiple sources together due to the existing distribution shift in data. One way to solve this problem is to utilize the transfer learning methodologies tailored to fit in this specific context.

Results: In this paper, we present two novel approaches for incorporating information from a secondary database for improving the prediction in a target database. The first …


Investigation Of Model Stacking For Drug Sensitivity Prediction, Kevin Matlock, Carlos De Niz, Raziur Rahman, Souparno Ghosh, Ranadip Pal Jan 2018

Investigation Of Model Stacking For Drug Sensitivity Prediction, Kevin Matlock, Carlos De Niz, Raziur Rahman, Souparno Ghosh, Ranadip Pal

Department of Statistics: Faculty Publications

Background: A significant problem in precision medicine is the prediction of drug sensitivity for individual cancer cell lines. Predictive models such as Random Forests have shown promising performance while predicting from individual genomic features such as gene expressions. However, accessibility of various other forms of data types including information on multiple tested drugs necessitates the examination of designing predictive models incorporating the various data types.

Results: We explore the predictive performance of model stacking and the effect of stacking on the predictive bias and squarred error. In addition we discuss the analytical underpinnings supporting the advantages of stacking in reducing …


Multiclass Classification Using Support Vector Machines, Duleep Prasanna W. Rathgamage Don Jan 2018

Multiclass Classification Using Support Vector Machines, Duleep Prasanna W. Rathgamage Don

Electronic Theses and Dissertations

In this thesis, we discuss different SVM methods for multiclass classification and introduce the Divide and Conquer Support Vector Machine (DCSVM) algorithm which relies on data sparsity in high dimensional space and performs a smart partitioning of the whole training data set into disjoint subsets that are easily separable. A single prediction performed between two partitions eliminates one or more classes in a single partition, leaving only a reduced number of candidate classes for subsequent steps. The algorithm continues recursively, reducing the number of classes at each step until a final binary decision is made between the last two classes …


Effect Of Neuromodulation Of Short-Term Plasticity On Information Processing In Hippocampal Interneuron Synapses, Elham Bayat Mokhtari Jan 2018

Effect Of Neuromodulation Of Short-Term Plasticity On Information Processing In Hippocampal Interneuron Synapses, Elham Bayat Mokhtari

Graduate Student Theses, Dissertations, & Professional Papers

Neurons convey information about the complex dynamic environment in the form of signals. Computational neuroscience provides a theoretical foundation toward enhancing our understanding of nervous system. The aim of this dissertation is to present techniques to study the brain and how it processes information in particular neurons in hippocampus.

We begin with a brief review of the history of neuroscience and biological background of basic neurons. To appreciate the importance of information theory, familiarity with the information theoretic basics is required, these basics are presented in Chapter 2. In Chapter 3, we use information theory to estimate the amount of …


Old English Character Recognition Using Neural Networks, Sattajit Sutradhar Jan 2018

Old English Character Recognition Using Neural Networks, Sattajit Sutradhar

Electronic Theses and Dissertations

Character recognition has been capturing the interest of researchers since the beginning of the twentieth century. While the Optical Character Recognition for printed material is very robust and widespread nowadays, the recognition of handwritten materials lags behind. In our digital era more and more historical, handwritten documents are digitized and made available to the general public. However, these digital copies of handwritten materials lack the automatic content recognition feature of their printed materials counterparts. We are proposing a practical, accurate, and computationally efficient method for Old English character recognition from manuscript images. Our method relies on a modern machine learning …


Some New And Generalized Distributions Via Exponentiation, Gamma And Marshall-Olkin Generators With Applications, Hameed Abiodun Jimoh Jan 2018

Some New And Generalized Distributions Via Exponentiation, Gamma And Marshall-Olkin Generators With Applications, Hameed Abiodun Jimoh

Electronic Theses and Dissertations

Three new generalized distributions developed via completing risk, gamma generator, Marshall-Olkin generator and exponentiation techniques are proposed and studied. Structural properties including quantile functions, hazard rate functions, moment, conditional moments, mean deviations, R\'enyi entropy, distribution of order statistics and maximum likelihood estimates are presented. Monte Carlo simulation is employed to examine the performance of the proposed distributions. Applications of the generalized distributions to real lifetime data are presented to illustrate the usefulness of the models.