#### The Impact Of Sample Size In Cross-Classified Multiple Membership Multilevel Models, Hyewon Chung, Jiseon Kim, Ryoungsun Park, Hyeonjeong Jean

*Journal of Modern Applied Statistical Methods*

A simulation study was conducted to examine parameter recovery in a cross-classified multiple membership multilevel model. No substantial relative bias was identified for the fixed effect or level-one variance component estimates. However, the level-two cross-classification multiple membership factor variance components were substantially biased with relatively fewer groups.

#### An Introduction To Psychological Statistics, Garett C. Foster, David Lane, David Scott, Mikki Hebl, Rudy Guerra, Dan Osherson, Heidi Zimmer

*Open Educational Resources Collection*

We are constantly bombarded by information, and finding a way to filter that information in an objective way is crucial to surviving this onslaught with your sanity intact. This is what statistics, and logic we use in it, enables us to do. Through the lens of statistics, we learn to find the signal hidden in the noise when it is there and to know when an apparent trend or pattern is really just randomness. The study of statistics involves math and relies upon calculations of numbers. But it also relies heavily on how the numbers are chosen and how the ...

#### Analysis Of Ranked Gene Tree Probability Distributions Under The Coalescent Process For Detecting Anomaly Zones, Anastasiia Kim

*Shared Knowledge Conference*

In phylogenetic studies, gene trees are used to reconstruct species tree. Under the multispecies coalescent model, gene trees topologies may differ from that of species trees. The incorrect gene tree topology (one that does not match the species tree) that is more probable than the correct one is termed anomalous gene tree (AGT). Species trees that can generate such AGTs are said to be in the anomaly zone (AZ). In this region, the method of choosing the most common gene tree as the estimate of the species tree will be inconsistent and will converge to an incorrect species tree when ...

#### 41 - Data Exploration And Analysis For The Hemingway Measure Of Adult Connectedness, Gildardo Bautista-Maya, Ping Ye, Diane Cook

*Georgia Undergraduate Research Conference (GURC)*

We analyze the dataset collected from students participating in the Boy With A Ball (BWAB) program, a faith-based community outreach group, through the Hemingway *Measure of Adult Connectedness*^{©}, a questionnaire measuring the social connectedness of adolescents. First, we approach the data in the conventional method provided by the Hemingway website. We then identify which questions are strong determiners in deciding whether a student has completed the BWAB program or not. With the goal of utilizing the logistic regression, we reduce the set of questions to those only identified as significant in other methods. These methods include linear regression, decision ...

#### Probabilities Involving Standard Trirectangular Tetrahedral Dice Rolls, Rulon Olmstead, Doneliezer Baize

*Rose-Hulman Undergraduate Mathematics Journal*

The goal is to be able to calculate probabilities involving irregular shaped dice rolls. Here it is attempted to model the probabilities of rolling standard tri-rectangular tetrahedral dice on a hard surface, such as a table top. The vertices and edges of a tetrahedron were projected onto the surface of a sphere centered at the center of mass of the tetrahedron. By calculating the surface areas bounded by the resultant geodesics, baseline probabilities were achieved. Using a 3D printer, dice were constructed of uniform density and the results of rolling them were recorded. After calculating the corresponding confidence intervals, the ...

#### Group-Lasso Estimation In High-Dimensional Factor Models With Structural Breaks, Yujie Song

*Major Papers*

In this major paper, we study the influence of structural breaks in the financial market model with high-dimensional data. We present a model which is capable of detecting changes in factor loadings, determining the number of factors and detecting the break date. We consider the case where the break date is both known and unknown and identify the type of instability. For the unknown break date case, we propose a group-LASSO estimator to determine the number of pre- and post-break factors, the break date and the existence of instability of factor loadings when the number of factor is constant. We ...

#### Estimation In High-Dimensional Factor Models With Structural Instabilities, Wen Gao

*Major Papers*

In this major paper, we use high-dimensional models to analyze macroeconomic data which is in influenced by the break point. In particular, we consider to detect the break point and study the changes of the number of factors and the factor loadings with the structural instability.

Concretely, we propose two factor models which explain the processes of pre- and post- break periods. Then, we consider the break point as known or unknown. In both situations, we derive the shrinkage estimators by minimizing the penalized least square function and calculate the estimators of the numbers of pre- and post- break factors ...

#### Using Cyclical Components To Improve The Forecasts Of The Stock Market And Macroeconomic Variables, Kenneth R. Szulczyk, Shibley Sadique

*Journal of Modern Applied Statistical Methods*

Economic variables such as stock market indices, interest rates, and national output measures contain cyclical components. Forecasting methods excluding these cyclical components yield inaccurate out-of-sample forecasts. Accordingly, a three-stage procedure is developed to estimate a vector autoregression (VAR) with cyclical components. A Monte Carlo simulation shows the procedure estimates the parameters accurately. Subsequently, a VAR with cyclical components improves the root-mean-square error of out-of-sample forecasts by 50% for a stock market model with macroeconomic variables.

#### Understanding Sexual Violence Against Women, Maria Martinez

*Annual Symposium on Biomathematics and Ecology: Education and Research*

#### Statistical Modeling Of Co2 Flux Data, Fang He

*Electronic Thesis and Dissertation Repository*

Carbon dioxide (CO2) flux is important for agriculture and carbon cycle studies. Only a small proportion of the land is currently covered by proper equipment to directly collect CO2 flux data. The CO2 flux data has an obvious annual cycle with the phase changing from year to year. How to build a model to estimate the annual effect and seasonal dynamics is a challenging task. With the help of the Moderate Resolution Imaging Spectroradiometer (MODIS) which is carried by NASA satellites, corresponding data, such as normalized difference vegetation index (NDVI), is freely available from NASA. Our goals are modeling the ...

#### Dealing With Sensitive Quantitative Variables: A Comparison Of Sampling Designs For The Procedure Of Gupta And Thornton, Carlos Narciso Bouza Herrera, Prayas Sharma

*Journal of Modern Applied Statistical Methods*

The use of randomized response procedures allows diminishing the number of non-responses and increasing the accuracy of the responses. A new sampling strategy is developed where the reports are scrambled using the procedure of Gupta and Thornton. The estimator of the mean as well as the errors are developed for the Rao-Hartley-Cochran and Ranked Sets Sampling designs. The proposals are compared with the original model based on the use of simple random sampling.

#### Comparison Of Multiple Imputation Methods For Categorical Survey Items With High Missing Rates: Application To The Family Life, Activity, Sun, Health And Eating (Flashe) Study, Benmei Liu, Erin Hennessy, April Oh, Laura A. Dwyer, Linda Nebeling

*Journal of Modern Applied Statistical Methods*

Two multiple imputation methods, the Sequential Regression Multivariate Imputation Algorithm and the Cox-Lannacchione Weighted Sequential Hotdeck, were examined and compared to impute highly missing categorical variables from the Family Life, Activity, Sun, Health and Eating (FLASHE) study. This paper describes the imputation approaches and results from the study.

#### Bayesian And Semi-Bayesian Estimation Of The Parameters Of Generalized Inverse Weibull Distribution, Kamaljit Kaur, Kalpana K. Mahajan, Sangeeta Arora

*Journal of Modern Applied Statistical Methods*

Bayesian and semi-Bayesian estimators of parameters of the generalized inverse Weibull distribution are obtained using Jeffreys’ prior and informative prior under specific assumptions of loss function. Using simulation, the relative efficiency of the proposed estimators is obtained under different set-ups. A real life example is also given.

#### The Periglacial Landscape Of Mars: Insight Into The 'Decameter-Scale Rimmed Depressions' In Utopia Planitia, Arya Bina

*Electronic Thesis and Dissertation Repository*

Currently, Mars appears to be in a ‘frozen’ and ‘dry’ state, with the clear majority of the planet’s surface maintaining year-round sub-zero temperatures. However, the discovery of features consistent with landforms found in periglacial environments on Earth, suggests a climate history for Mars that may have involved freeze and thaw cycles. Such landforms include hummocky, polygonised, scalloped, and pitted terrains, as well as ice-rich deposits and gullies, along the mid- to high-latitude bands, typically with no lower than 20o N/S. The detection of near-surface and surface ice via the Phoenix lander, excavation of ice via recent impact cratering ...

#### Random Forest Vs Logistic Regression: Binary Classification For Heterogeneous Datasets, Kaitlin Kirasich, Trace Smith, Bivin Sadler

*SMU Data Science Review*

Selecting a learning algorithm to implement for a particular application on the basis of performance still remains an ad-hoc process using fundamental benchmarks such as evaluating a classifier’s overall loss function and misclassification metrics. In this paper we address the difficulty of model selection by evaluating the overall classification performance between random forest and logistic regression for datasets comprised of various underlying structures: (1) increasing the variance in the explanatory and noise variables, (2) increasing the number of noise variables, (3) increasing the number of explanatory variables, (4) increasing the number of observations. We developed a model evaluation tool ...

#### Predicting National Basketball Association Success: A Machine Learning Approach, Adarsh Kannan, Brian Kolovich, Brandon Lawrence, Sohail Rafiqi

*SMU Data Science Review*

In this paper, we present a machine learning based approach to projecting the success of National Basketball Association (NBA) draft prospects. With the proliferation of data, analytics have increasingly be- come a critical component in the assessment of professional and collegiate basketball players. We leverage player biometric data, college statistics, draft selection order, and positional breakdown as modelling features in our prediction algorithms. We found that a player's draft pick and their college statistics are the best predictors of their longevity in the National Basketball Association.

#### Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels

*SMU Data Science Review*

In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as *recommended* or *non-recommended* affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that ...

#### Cryptocurrency Price Prediction Using Tweet Volumes And Sentiment Analysis, Jethin Abraham, Daniel Higdon, John Nelson, Juan Ibarra

*SMU Data Science Review*

In this paper, we present a method for predicting changes in Bitcoin and Ethereum prices utilizing Twitter data and Google Trends data. Bitcoin and Ethereum, the two largest cryptocurrencies in terms of market capitalization represent over \$160 billion dollars in combined value. However, both Bitcoin and Ethereum have experienced significant price swings on both daily and long term valuations. Twitter is increasingly used as a news source influencing purchase decisions by informing users of the currency and its increasing popularity. As a result, quickly understanding the impact of tweets on price direction can provide a purchasing and selling advantage to ...

#### Of Typicality And Predictive Distributions In Discriminant Function Analysis, Lyle W. Konigsberg, Susan R. Frankenberg

*Human Biology Open Access Pre-Prints*

While discriminant function analysis is an inherently Bayesian method, researchers attempting to estimate ancestry in human skeletal samples often follow discriminant function analysis with the calculation of frequentist-based typicalities for assigning group membership. Such an approach is problematic in that it fails to account for admixture and for variation in why individuals may be classified as outliers, or non-members of particular groups. This paper presents an argument and methodology for employing a fully Bayesian approach in discriminant function analysis applied to cases of ancestry estimation. The approach requires adding the calculation, or estimation, of predictive distributions as the final step ...

#### Error Estimates For Projection-Based Dynamic Augmented Lagrangian Boundary Condition Enforcement, With Application To Fluid–Structure Interaction, Yue Yu, David Kamensky, Ming-Chen Hsu, Xin Yang Lu, Yuri Bazilevs, Thomas J.R. Hughes

*Mechanical Engineering Publications*

In this work, we analyze the convergence of the recent numerical method for enforcing fluid–structure interaction (FSI) kinematic constraints in the immersogeometric framework for cardiovascular FSI. In the immersogeometric framework, the structure is modeled as a thin shell, and its influence on the fluid subproblem is imposed as a forcing term. This force has the interpretation of a Lagrange multiplier field supplemented by penalty forces, in an augmented Lagrangian formulation of the FSI kinematic constraints. Because of the non-matching fluid and structure discretizations used, no discrete *inf-sup* condition can be assumed. To avoid solving (potentially unstable) discrete saddle point ...