Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 10 of 10

Full-Text Articles in Physical Sciences and Mathematics

Leveraging Reviews To Improve User Experience, Anthony Schams, Iram Bakhtiar, Cristina Stanley May 2019

Leveraging Reviews To Improve User Experience, Anthony Schams, Iram Bakhtiar, Cristina Stanley

SMU Data Science Review

In this paper, we will explore and present a method of finding characteristics of a restaurant using its reviews through machine learning algorithms. We begin by building models to predict the ratings of individual reviews using text and categorical features. This is to examine the efficacy of the algorithms to the task. Both XGBoost and logistic regression will be examined. With these models, our goal is then to identify key phrases in reviews that are correlated with positive and negative experience. Our analysis makes use of review data publicly made available by Yelp. Key bigrams extracted were non-specific to the …


Decision Trees: Predicting Future Losses For Insurance Data, Amanda Lahrmann Jan 2018

Decision Trees: Predicting Future Losses For Insurance Data, Amanda Lahrmann

Williams Honors College, Honors Research Projects

Big data is a term that has come to the spotlight for companies within recent years. Data analysis and business intelligence have become prominent sectors of companies and agencies. But what is big data? How has it impacted large companies and agencies? Why must it be embraced?

The best way to approach utilizing a big data set is to establish a question to answer. For this data set, the question that must be answered is “What variables cause a loss to occur?” To answer this question, first, we must understand what is meant by a “loss”, and take a look …


Using The R Library Rpanel For Gui-Based Simulations In Introductory Statistics Courses, Ryan M. Allison May 2012

Using The R Library Rpanel For Gui-Based Simulations In Introductory Statistics Courses, Ryan M. Allison

Statistics

As a student, I noticed that the statistical package R (http://www.r-project.org) would have several benefits of its usage in the classroom. One benefit to the package is its free and open-source nature. This would be a great benefit for instructors and students alike since it would be of no cost to use, unlike other statistical packages. Due to this, students could continue using the program after their statistical courses and into their professional careers. It would be good to expose students while they are in school to a tool that professionals use in industry. R also has powerful …


Two-Factor Agricultural Experiment With Repeated Measures On One Factor In A Complete Randomized Design, Armando Garsd, María Del C. Fabrizio, María V. López Apr 1995

Two-Factor Agricultural Experiment With Repeated Measures On One Factor In A Complete Randomized Design, Armando Garsd, María Del C. Fabrizio, María V. López

Conference on Applied Statistics in Agriculture

A typical agricultural experiment involves comparisons of several treatments at different points in time. The ensuing lack of independence between observations of the same experimental unit may then impair the attainment of statistical significance by the standard analysis of variance, and calls for the application of more powerful methods. This paper addresses one such method, the so-called two-factor experiment with repeated measures on one factor. We discuss the adequacy of this model in the context of three concrete examples drawn from agricultural experimentation.


The Use Of Contingency Table Analysis As A Robust Technique For Analysis Of Variance, Mei-Eing Chiu May 1982

The Use Of Contingency Table Analysis As A Robust Technique For Analysis Of Variance, Mei-Eing Chiu

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

The purpose of this paper is to compare Analysis of Variance with Contingency Table Analysis when the data being analyzed do not satisfy Analysis of Variance assumptions. The criteria for comparison are the powers of the Standard variance-ratio and the Chi-square test.

The test statistic and powers were obtained by Monte Carlo.

1. Calculate test statistic for each of 100 trials, this process was repeated 12 times. Each time different combination of means and variances were used.

2. Powers were obtained for each of 12 combinations of means and variances.

Whether Analysis of Variance or Contingency Table Analysis is a …


Exact Analysis Of Variance With Unequal Variances, Noriaki Yanagi May 1980

Exact Analysis Of Variance With Unequal Variances, Noriaki Yanagi

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

The purpose of this paper was to present the exact analysis of variance with unequal variances. Bishop presented the new procedure for the r-way layout ANOVA. In this paper, one and two way layout ANOVA were explained and Bishop's method and Standard method were compared by using a Monte Carlo method.


A Μ-Model Approach On The Cell Means: The Analysis Of Full, Design Models With Non-Orthogonal Data, Richard Van Koningsveld May 1979

A Μ-Model Approach On The Cell Means: The Analysis Of Full, Design Models With Non-Orthogonal Data, Richard Van Koningsveld

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

This work considers the application of a µ-model approach on the cell means to a special yet important class of experimental designs. These include full factorial, completely nested, and mixed models with one or more observations per cell. By limiting attention to full models, an approach to the general data situation is developed which is both conceptually simple and computationally advantageous.

Conceptually, the method is simple because the design related effects are defined as if the cell means are single observations. This leads to a rather simple algorithm for generating main effect contrasts, from which associated interaction contrasts can also …


Linear Comparisons In Multivariate Analysis Of Variance, Hsin-Ming Tzeng Jan 1976

Linear Comparisons In Multivariate Analysis Of Variance, Hsin-Ming Tzeng

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

The analysis of variance was created by Ronald Fisher in 1923. It is most widely used and basically useful approach to study differences among treatment averages.


Principal Component Factor Analysis, Kuang-Ming Chu Jan 1974

Principal Component Factor Analysis, Kuang-Ming Chu

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

The principal-factor solution is probably the most widely used technique in factor analysis and a relatively straight forward method to determine the minimum number of independent dimensions needed to account for most of the variance in the original set of variables.

The principal components approach to parsimony was first proposed by Karl Pearson (1901) who studied the problem for the case of nonstochastic variables, and in a different context. Hotelling provided the full development of the method (1933) and Thomson (1947) was the first to apply it to the principal factor analysis.

This method was first developed to deal with …


Selecting The Best Linear Model From A Subset Of All Possible Models For A Given Set Of Predictors In A Multiple Linear Regression Analysis, David L. Jensen May 1972

Selecting The Best Linear Model From A Subset Of All Possible Models For A Given Set Of Predictors In A Multiple Linear Regression Analysis, David L. Jensen

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Sixteen "model building" and "model selection" procedures commonly encountered in industry, all of which were initially alleged to be capable of identifying the best model from the collection of 2k possible linear models corresponding to a given set of k predictors in a multiple linear regression analysis, were individually summarized and subsequently evaluated by considering their comparative advantages and limitations from both a theoretical and a practical standpoint. It was found that none of the proposed procedures were absolutely infallible and that several were actually unsuitable. However, it was also found that most of these techniques could still be …