Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 21 of 21

Full-Text Articles in Entire DC Network

Functional Data Learning Using Convolutional Neural Networks, Jose Galarza, Tamer Oraby Feb 2024

Functional Data Learning Using Convolutional Neural Networks, Jose Galarza, Tamer Oraby

School of Mathematical and Statistical Sciences Faculty Publications and Presentations

In this paper, we show how convolutional neural networks (CNNs) can be used in regression and classification learning problems for noisy and non-noisy functional data (FD). The main idea is to transform the FD into a 28 by 28 image. We use a specific but typical architecture of a CNN to perform all the regression exercises of parameter estimation and functional form classification. First, we use some functional case studies of FD with and without random noise to showcase the strength of the new method. In particular, we use it to estimate exponential growth and decay rates, the bandwidths of …


The Conviction Of Miss Prediction, Dane C. Joseph Jan 2024

The Conviction Of Miss Prediction, Dane C. Joseph

Journal of Humanistic Mathematics

Miss Prediction is questioned in a court of law over her involvement in the mischaracterization of linear models when they were inappropriate.


Examining Model Complexity's Effects When Predicting Continuous Measures From Ordinal Labels, Mckade S. Thomas May 2023

Examining Model Complexity's Effects When Predicting Continuous Measures From Ordinal Labels, Mckade S. Thomas

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Many real world problems require the prediction of ordinal variables where the values are a set of categories with an ordering to them. However, in many of these cases the categorical nature of the ordinal data is not a desirable outcome. As such, regression models treat ordinal variables as continuous and do not bind their predictions to discrete categories. Prior research has found that these models are capable of learning useful information between the discrete levels of the ordinal labels they are trained on, but complex models may learn ordinal labels too closely, missing the information between levels. In this …


The 2015 Ncaa Cost-Of-Attendance Stipend And Its Effects On Institutional Financial Aid Packages, Sara Greene Apr 2023

The 2015 Ncaa Cost-Of-Attendance Stipend And Its Effects On Institutional Financial Aid Packages, Sara Greene

Honors Theses

In 2015, the National Collegiate Athletic Association (NCAA) allowed “Cost of Attendance” (COA) stipends to be offered to athletic recruits for Division I schools. These stipends are intended to allow schools to grant aid to student-athletes beyond a full-ride scholarship to cover additional costs imposed on student-athletes. These stipends created an opportunity for the “Autonomy” Power 5 programs to utilize a competitive tactic to try to win over the top recruits. There is evidence that these COA stipends have caused an increase in the estimated cost of attendance reported by the university. This paper examines if the COA stipends have …


Continuous Semi-Supervised Nonnegative Matrix Factorization, Michael R. Lindstrom, Xiaofu Ding, Feng Liu, Anand Somayajula, Deanna Needell Mar 2023

Continuous Semi-Supervised Nonnegative Matrix Factorization, Michael R. Lindstrom, Xiaofu Ding, Feng Liu, Anand Somayajula, Deanna Needell

School of Mathematical and Statistical Sciences Faculty Publications and Presentations

Nonnegative matrix factorization can be used to automatically detect topics within a corpus in an unsupervised fashion. The technique amounts to an approximation of a nonnegative matrix as the product of two nonnegative matrices of lower rank. In certain applications it is desirable to extract topics and use them to predict quantitative outcomes. In this paper, we show Nonnegative Matrix Factorization can be combined with regression on a continuous response variable by minimizing a penalty function that adds a weighted regression error to a matrix factorization error. We show theoretically that as the weighting increases, the regression error in training …


Contributions To Random Forest Variable Importance With Applications In R, Kelvyn K. Bladen Aug 2022

Contributions To Random Forest Variable Importance With Applications In R, Kelvyn K. Bladen

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

A major focus in statistics is building and improving computational algorithms that can use data to predict a response. Two fundamental camps of research arise from such a goal. The first camp is researching ways to get more accurate predictions. Many sophisticated methods, collectively known as machine learning methods, have been developed for this very purpose. One such method that is widely used across industry and many other areas of investigation is called Random Forests.

The second camp of research is that of improving the interpretability of machine learning methods. This is worthy of attention when analysts desire to optimize …


Extension To Multidimensional Problems Of A Fuzzy-Based Explainable & Noise-Resilient Algorithm, Javier Viana, Stephan Ralescu, Kelly Cohen, Anca Ralescu, Vladik Kreinovich May 2021

Extension To Multidimensional Problems Of A Fuzzy-Based Explainable & Noise-Resilient Algorithm, Javier Viana, Stephan Ralescu, Kelly Cohen, Anca Ralescu, Vladik Kreinovich

Departmental Technical Reports (CS)

While Deep Neural Networks (DNNs) have shown incredible performance in a variety of data, they are brittle and opaque: easily fooled by the presence of noise, and difficult to understand the underlying reasoning for their predictions or choices. This focus on accuracy at the expense of interpretability and robustness caused little concern since, until recently, DNNs were employed primarily for scientific and limited commercial work. An increasing, widespread use of artificial intelligence and growing emphasis on user data protections, however, motivates the need for robust solutions with explainable methods and results. In this work, we extend a novel fuzzy based …


Regression Analysis: Graduation Rate In Kentucky Public High Schools, Rebecca Price Jan 2021

Regression Analysis: Graduation Rate In Kentucky Public High Schools, Rebecca Price

Mahurin Honors College Capstone Experience/Thesis Projects

Kentucky’s Public High School graduation rates vary widely across the rural and urban regions in the state. In addition to their graduation rates, each of these schools have their own unique demographics, funding, teacher-student ratio, etc. that define said school’s identity. This research aims to analyze the aforementioned variables, as well as other variables listed on each school state report card, in order to create a model to predict any school’s graduation rate.

In order to create this model, data was taken on all public high schools in Kentucky from the Kentucky Department of Education’s School Report Card. Data were …


A Self-Contained Course In The Mathematical Theory Of Statistics For Scientists & Engineers With An Emphasis On Predictive Regression Modeling & Financial Applications., Tim Smith Apr 2019

A Self-Contained Course In The Mathematical Theory Of Statistics For Scientists & Engineers With An Emphasis On Predictive Regression Modeling & Financial Applications., Tim Smith

Timothy Smith

Preface & Acknowledgments

This textbook is designed for a higher level undergraduate, perhaps even first year graduate, course for engineering or science students who are interested to gain knowledge of using data analysis to make predictive models. While there is no statistical perquisite knowledge required to read this book, due to the fact that the study is designed for the reader to truly understand the underlying theory rather than just learn how to read computer output, it would be best read with some familiarity of elementary statistics. The book is self-contained and the only true perquisite knowledge is a solid …


Mathematical Analysis Of The Duck Migration To Louisiana, Brandon Garcia Apr 2019

Mathematical Analysis Of The Duck Migration To Louisiana, Brandon Garcia

Mathematics Senior Capstone Papers

The purpose of this project is to research the relationship between duck migration and weather patterns, more specifically trying to determine if the rainfall and temperature in a given year affects the migration patterns of ducks. Duck hunters and conservation- ists alike have observed an overall decrease in the duck population in Louisiana over the past 70 years. Though some years have seen an increase, the population has not recovered to the level from the 1950s. These observations have led to many questions about what have happened to the ducks or where have the ducks gone. Using differ- ent forms …


A Self-Contained Course In The Mathematical Theory Of Statistics For Scientists & Engineers With An Emphasis On Predictive Regression Modeling & Financial Applications., Tim Smith Jan 2019

A Self-Contained Course In The Mathematical Theory Of Statistics For Scientists & Engineers With An Emphasis On Predictive Regression Modeling & Financial Applications., Tim Smith

Open Access Textbooks

Preface & Acknowledgments

This textbook is designed for a higher level undergraduate, perhaps even first year graduate, course for engineering or science students who are interested to gain knowledge of using data analysis to make predictive models. While there is no statistical perquisite knowledge required to read this book, due to the fact that the study is designed for the reader to truly understand the underlying theory rather than just learn how to read computer output, it would be best read with some familiarity of elementary statistics. The book is self-contained and the only true perquisite knowledge is a solid …


Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels Aug 2018

Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels

SMU Data Science Review

In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that reviews …


A Logistic Regression Analysis Of First-Time College Students’ Completion Rates At The University Of Southern Mississippi, Jesse Homer Robinson May 2018

A Logistic Regression Analysis Of First-Time College Students’ Completion Rates At The University Of Southern Mississippi, Jesse Homer Robinson

Honors Theses

The demand for employees with a college degree is steadily on the rise in a plethora of competitive job markets throughout the United States. This increase in demand has aided in the increasing college enrollment rates throughout the country. However, unlike enrollment trends, the rate of college completion has not had the same fortunate rise.

The goal of this study is to research and compare differences among those first-time college students who completed college within four years, six years, or did not complete. The primary source for data in this study was the Office of Institutional Research at USM. Both …


Interval Methods For Data Fitting Under Uncertainty: A Probabilistic Treatment, Vladik Kreinovich, Sergey P. Shary Dec 2015

Interval Methods For Data Fitting Under Uncertainty: A Probabilistic Treatment, Vladik Kreinovich, Sergey P. Shary

Departmental Technical Reports (CS)

How to estimate parameters from observations subject to errors and uncertainty? Very often, the measurement errors are random quantities that can be adequately described by the probability theory. When we know that the measurement errors are normally distributed with zero mean, then the (asymptotically optimal) Maximum Likelihood Method leads to the popular least squares estimates. In many situations, however, we do not know the shape of the error distribution, we only know that the measurement errors are located on a certain interval. Then the maximum entropy approach leads to a uniform distribution on this interval, and the Maximum Likelihood Method …


Rows Vs. Columns: Randomized Kaczmarz Or Gauss-Seidel For Ridge Regression, Ahmed Hefny, Deanna Needell, Aaditya Ramdas Jul 2015

Rows Vs. Columns: Randomized Kaczmarz Or Gauss-Seidel For Ridge Regression, Ahmed Hefny, Deanna Needell, Aaditya Ramdas

CMC Faculty Publications and Research

The Kaczmarz and Gauss-Seidel methods aim to solve a linear m × n system Xβ = y by iteratively refining the solution estimate; the former uses random rows of X to update β given the corresponding equations and the latter uses random columns of X to update corresponding coordinates in β. Interest in these methods was recently revitalized by a proof of Strohmer and Vershynin showing linear convergence in expectation for a randomized Kaczmarz method variant (RK), and a similar result for the randomized Gauss-Seidel algorithm (RGS) was later proved by Lewis and Leventhal. Recent work unified the analysis of …


A Stochastic Parameter Regression Approach For Time-Varying Relationship Between Gold And Silver Prices, Birsen Canan-Mcglone Aug 2012

A Stochastic Parameter Regression Approach For Time-Varying Relationship Between Gold And Silver Prices, Birsen Canan-Mcglone

Boise State University Theses and Dissertations

In this thesis, we studied the gold and silver relationship using stochastic-parameter regression models. We formulated their time-varying relationship as a state-space model and used the Kalman filter algorithm to estimate the stochastic regression parameters for gold and silver prices. The data set used in this thesis covers 31 years using the London fix prices between January 1969 and December 2000. The start date was selected as the first full year silver prices were included in the London fix prices. Our stochastic parameter regression model explained well the time-varying relationship between gold and silver prices. As a special case of …


Review Of Super Crunchers By Ian Ayers, Eric Gaze Jun 2009

Review Of Super Crunchers By Ian Ayers, Eric Gaze

Numeracy

Ayers, I. Super Crunchers: Why Thinking-by-Numbers Is the New Way to be Smart. (Bantam Dell Publishing Group, 2007). 272 pp. Hard cover $25 ISBN 978-0-553-80540-6.

Super Crunchers tells the story of how analyzing data is changing the ways in which decisions are made. We in the National Numeracy Network make a case for the importance of quantitative literacy by referring to how much quantitative information is now available to each of us: “a world awash in numbers.” Ian Ayres zeroes in on the people who are making a living crunching all of these data. From the seemingly innocuous (how …


Self-Consistency: A Fundamental Concept In Statistics, Thaddeus Tarpey, Bernard Flury Aug 1996

Self-Consistency: A Fundamental Concept In Statistics, Thaddeus Tarpey, Bernard Flury

Mathematics and Statistics Faculty Publications

The term ''self-consistency'' was introduced in 1989 by Hastie and Stuetzle to describe the property that each point on a smooth curve or surface is the mean of all points that project orthogonally onto it. We generalize this concept to self-consistent random vectors: a random vector Y is self-consistent for X if E[X|Y] = Y almost surely. This allows us to construct a unified theoretical basis for principal components, principal curves and surfaces, principal points, principal variables, principal modes of variation and other statistical methods. We provide some general results on self-consistent random variables, give …


Introduction To Linear Algebra: Models, Methods, And Theory, Alan Tucker Jan 1995

Introduction To Linear Algebra: Models, Methods, And Theory, Alan Tucker

Department of Applied Mathematics & Statistics Faculty Books

This book develops linear algebra around matrices. Vector spaces in the abstract are not considered, only vector spaces associated with matrices. This book puts problem solving and an intuitive treatment of theory first, with a proof-oriented approach intended to come in a second course, the same way that calculus is taught. The book's organization is straightforward: Chapter 1 has introductory linear models; Chapter 2 has the basics of matrix algebra; Chapter 3 develops different ways to solve a system of equations; Chapter 4 has applications, and Chapter 5 has vector-space theory associated with matrices and related topics such as pseudoinverses …


A Comparison Of Two Linear Nonparametric Regression Techniques, Sylvain Sardy May 1992

A Comparison Of Two Linear Nonparametric Regression Techniques, Sylvain Sardy

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

This thesis presented a useful tool in regression. Nonparametric linear regression techniques were described in the general context of regression. A comparison of two of these techniques, kernel regression and iterative regression, showed various aspects of nonparametric linear regressors.


A Monte Carlo Study Of Non-Linear Regression Theory, Ya-Ming Liu May 1966

A Monte Carlo Study Of Non-Linear Regression Theory, Ya-Ming Liu

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Multiple regression provides the capability of using non-linear functions to fit various curvilinear surfaces. These non-linear functions are, however, linear in the parameters. Non-linear term of the variables such as X2, X3, ln X, X, YX are incorporated in a linear model. For example:

Y = b0 + b1 x1 + b2 x12 + b3 lnx2 + ϵ

Many practical situations require the fitting of mathematical functions which are non-linear in the parameters and perhaps the variables. For example:

Y = b, eb2X + ϵ