Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Science

PDF

2022

Institution
Keyword
Publication
Publication Type

Articles 1 - 23 of 23

Full-Text Articles in Statistical Models

Classification Of Breast Cancer Histopathological Images Using Semi-Supervised Gans, Balaji Avvaru, Nibhrat Lohia, Sowmya Mani, Vijayasrikanth Kaniti Sep 2022

Classification Of Breast Cancer Histopathological Images Using Semi-Supervised Gans, Balaji Avvaru, Nibhrat Lohia, Sowmya Mani, Vijayasrikanth Kaniti

SMU Data Science Review

Breast cancer is diagnosed more frequently than skin cancer in women in the United States. Most breast cancer cases are diagnosed in women, while children and men are less likely to develop the disease. Various tissues in the breast grow uncontrollably, resulting in breast cancer. Different treatments analyze microscopic histopathology images for diagnosis that help accurately detect cancer cells. Deep learning is one of the evolving techniques to classify images where accuracy depends on the volume and quality of labeled images. This study used various pre-trained models to train the histopathological images and analyze these models to create a new …


Predicting Insulin Pump Therapy Settings, Riccardo L. Ferraro, David Grijalva, Alex Trahan Sep 2022

Predicting Insulin Pump Therapy Settings, Riccardo L. Ferraro, David Grijalva, Alex Trahan

SMU Data Science Review

Millions of people live with diabetes worldwide [7]. To mitigate some of the many symptoms associated with diabetes, an estimated 350,000 people in the United States rely on insulin pumps [17]. For many of these people, how effectively their insulin pump performs is the difference between sleeping through the night and a life threatening emergency treatment at a hospital. Three programmed insulin pump therapy settings governing effective insulin pump function are: Basal Rate (BR), Insulin Sensitivity Factor (ISF), and Carbohydrate Ratio (ICR). For many people using insulin pumps, these therapy settings are often not correct, given their physiological needs. While …


Application Of Probabilistic Ranking Systems On Women’S Junior Division Beach Volleyball, Cameron Stewart, Michael Mazel, Bivin Sadler Sep 2022

Application Of Probabilistic Ranking Systems On Women’S Junior Division Beach Volleyball, Cameron Stewart, Michael Mazel, Bivin Sadler

SMU Data Science Review

Women’s beach volleyball is one of the fastest growing collegiate sports today. The increase in popularity has come with an increase in valuable scholarship opportunities across the country. With thousands of athletes to sort through, college scouts depend on websites that aggregate tournament results and rank players nationally. This project partnered with the company Volleyball Life, who is the current market leader in the ranking space of junior beach volleyball players. Utilizing the tournament information provided by Volleyball Life, this study explored replacements to the current ranking systems, which are designed to aggregate player points from recent tournament placements. Three …


Understanding Consumers' Use Experience On Electrically Heated Jacket: A Study On Online Review Using Topic Modeling, Md Nakib-Ul Hasan Aug 2022

Understanding Consumers' Use Experience On Electrically Heated Jacket: A Study On Online Review Using Topic Modeling, Md Nakib-Ul Hasan

LSU Doctoral Dissertations

The demand for heated jackets is anticipated to be fuelled by frequent temperature drops, severe winter weather, and increasing outdoor activities. Electrically heated jackets (EHJ) are primarily marketed through online distribution channels and expansion of online sales channels is expected to boost the global market. Consumers are increasingly relying on online reviews from other consumers to help them decide what to buy. Businesses also actively monitor and manage their online reviews to build trust in their brand and make it more likely that customers will buy. Traditional approaches for assessing customer behavior, such as market research surveys and focus groups, …


A Transformer-Based Classification System For Volcanic Seismic Signals, Anthony P. Rinaldi, Cindy Mora Stock, Cristián Bravo Roman, Alexander Hemming Aug 2022

A Transformer-Based Classification System For Volcanic Seismic Signals, Anthony P. Rinaldi, Cindy Mora Stock, Cristián Bravo Roman, Alexander Hemming

Undergraduate Student Research Internships Conference

Monitoring volcanic events as they occur is a task that, to this day, requires significant human capital. The current process requires geologists to monitor seismographs around the clock, making it extremely labour-intensive and inefficient. The ability to automatically classify volcanic events as they happen in real-time would allow for quicker responses to these events by the surrounding communities. Timely knowledge of the type of event that is occurring can allow these surrounding communities to prepare or evacuate sooner depending on the magnitude of the event. Up until recently, not much research has been conducted regarding the potential for machine learning …


Investigation Of Key Factors To Earthquake Insurance Take-Up Rates In Quebec And British Columbia Households And Prediction Model Building, Yongcheng Jiang Aug 2022

Investigation Of Key Factors To Earthquake Insurance Take-Up Rates In Quebec And British Columbia Households And Prediction Model Building, Yongcheng Jiang

Undergraduate Student Research Internships Conference

Maintaining an adequate level of earthquake take-up rate could protect the insurance industry from systemic failure. Past research has shown that British Columbia and Quebec have significant differences in earthquake insurance take-up rate. This report investigates key factors from the structure (default options and various types) of the insurance plan and personal characteristics along with socioeconomic/demographic profiles that affect the demand for earthquake protection in the form of insurance. The report also provides a prediction model for earthquake insurance take-up rate. The results show an importance ranking of key factors of earthquake insurance take up, the most important three are …


Computer Aided Diagnosis System For Breast Cancer Using Deep Learning., Asma Baccouche Aug 2022

Computer Aided Diagnosis System For Breast Cancer Using Deep Learning., Asma Baccouche

Electronic Theses and Dissertations

The recent rise of big data technology surrounding the electronic systems and developed toolkits gave birth to new promises for Artificial Intelligence (AI). With the continuous use of data-centric systems and machines in our lives, such as social media, surveys, emails, reports, etc., there is no doubt that data has gained the center of attention by scientists and motivated them to provide more decision-making and operational support systems across multiple domains. With the recent breakthroughs in artificial intelligence, the use of machine learning and deep learning models have achieved remarkable advances in computer vision, ecommerce, cybersecurity, and healthcare. Particularly, numerous …


Statistical Extensions Of Multi-Task Learning With Semiparametric Methods And Task Diagnostics, Nikolay Miller Jun 2022

Statistical Extensions Of Multi-Task Learning With Semiparametric Methods And Task Diagnostics, Nikolay Miller

Mathematics & Statistics ETDs

In this dissertation, I propose new approaches to multi-task learning, inspired by statistical model diagnostics and semiparametric and additive modeling. The newly designed additive multi-task model framework allows for flexible estimation of multi-task parametric and nonparametric effects by using an extension of the backfitting algorithm. Further, I propose new methods for statistical task diagnostics, which allow for the identification and remedy of outlier tasks, based on task-specific performance metrics and their empirical distributions. I perform a deep examination of the well-established multi-task kernel method and achieve theoretical and experimental contributions. Lastly, I propose a two-step modeling approach to multi-task modeling, …


Applications Of Machine Learning Algorithms In Materials Science And Bioinformatics, Mohammed Quazi Jun 2022

Applications Of Machine Learning Algorithms In Materials Science And Bioinformatics, Mohammed Quazi

Mathematics & Statistics ETDs

The piezoelectric response has been a measure of interest in density functional theory (DFT) for micro-electromechanical systems (MEMS) since the inception of MEMS technology. Piezoelectric-based MEMS devices find wide applications in automobiles, mobile phones, healthcare devices, and silicon chips for computers, to name a few. Piezoelectric properties of doped aluminum nitride (AlN) have been under investigation in materials science for piezoelectric thin films because of its wide range of device applicability. In this research using rigorous DFT calculations, high throughput ab-initio simulations for 23 AlN alloys are generated.

This research is the first to report strong enhancements of piezoelectric properties …


A Bayesian Programming Approach To Car-Following Model Calibration And Validation Using Limited Data, Franklin Abodo Jun 2022

A Bayesian Programming Approach To Car-Following Model Calibration And Validation Using Limited Data, Franklin Abodo

FIU Electronic Theses and Dissertations

Traffic simulation software is used by transportation researchers and engineers to design and evaluate changes to roadway networks. Underlying these simulators are mathematical models of microscopic driver behavior from which macroscopic measures of flow and congestion can be recovered. Many models are intended to apply to only a subset of possible traffic scenarios and roadway configurations, while others do not have any explicit constraint on their applicability. Work zones on highways are one scenario for which no model invented to date has been shown to accurately reproduce realistic driving behavior. This makes it difficult to optimize for safety and other …


Adjusting Community Survey Data Benchmarks For External Factors, Allen Miller, Nicole M. Norelli, Robert Slater, Mingyang N. Yu Jun 2022

Adjusting Community Survey Data Benchmarks For External Factors, Allen Miller, Nicole M. Norelli, Robert Slater, Mingyang N. Yu

SMU Data Science Review

Abstract. Using U.S. resident survey data from the National Community Survey in combination with public data from the U.S. Census and additional sources, a Voting Regressor Model was developed to establish fair benchmark values for city performance. These benchmarks were adjusted for characteristics the city cannot easily influence that contribute to confidence in local government, such as population size, demographics, and income. This adjustment allows for a more meaningful comparison and interpretation of survey results among individual cities. Methods explored for the benchmark adjustment included cluster analysis, anomaly detection, and a variety of regression techniques, including random forest, ridge, decision …


Data Ethics: An Investigation Of Data, Algorithms, And Practice, Gabrialla S. Cockerell May 2022

Data Ethics: An Investigation Of Data, Algorithms, And Practice, Gabrialla S. Cockerell

Honors Projects

This paper encompasses an examination of defective data collection, algorithms, and practices that continue to be cycled through society under the illusion that all information is processed uniformly, and technological innovation consistently parallels societal betterment. However, vulnerable communities, typically the impoverished and racially discriminated, get ensnared in these harmful cycles due to their disadvantages. Their hindrances are reflected in their information due to the interconnectedness of data, such as race being highly correlated to wealth, education, and location. However, their information continues to be analyzed with the same measures as populations who are not significantly affected by racial bias. Not …


Impact Of Climate Oscillations/Indices On Hydrological Variables In The Mississippi River Valley Alluvial Aquifer., Meena Raju May 2022

Impact Of Climate Oscillations/Indices On Hydrological Variables In The Mississippi River Valley Alluvial Aquifer., Meena Raju

Theses and Dissertations

The Mississippi River Valley Alluvial Aquifer (MRVAA) is one of the most productive agricultural regions in the United States. The main objectives of this research are to identify long term trends and change points in hydrological variables (streamflow and rainfall), to assess the relationship between hydrological variables, and to evaluate the influence of global climate indices on hydrological variables. Non-parametric tests, MMK and Pettitt’s tests were used to analyze trend and change points. PCC and Streamflow elasticity analysis were used to analyze the relationship between streamflow and rainfall and the sensitivity of streamflow to rainfall changes. PCC and MLR analysis …


Finding A Representative Distribution For The Tail Index Alpha, Α, For Stock Return Data From The New York Stock Exchange, Jett Burns May 2022

Finding A Representative Distribution For The Tail Index Alpha, Α, For Stock Return Data From The New York Stock Exchange, Jett Burns

Electronic Theses and Dissertations

Statistical inference is a tool for creating models that can accurately display real-world events. Special importance is given to the financial methods that model risk and large price movements. A parameter that describes tail heaviness, and risk overall, is α. This research finds a representative distribution that models α. The absolute value of standardized stock returns from the Center for Research on Security Prices are used in this research. The inference is performed using R. Approximations for α are found using the ptsuite package. The GAMLSS package employs maximum likelihood estimation to estimate distribution parameters using the CRSP data. The …


Early-Warning Alert Systems For Financial-Instability Detection: An Hmm-Driven Approach, Xing Gu Apr 2022

Early-Warning Alert Systems For Financial-Instability Detection: An Hmm-Driven Approach, Xing Gu

Electronic Thesis and Dissertation Repository

Regulators’ early intervention is crucial when the financial system is experiencing difficulties. Financial stability must be preserved to avert banks’ bailouts, which hugely drain government's financial resources. Detecting in advance periods of financial crisis entails the development and customisation of accurate and robust quantitative techniques. The goal of this thesis is to construct automated systems via the interplay of various mathematical and statistical methodologies to signal financial instability episodes in the near-term horizon. These signal alerts could provide regulatory bodies with the capacity to initiate appropriate response that will thwart or at least minimise the occurrence of a financial crisis. …


Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano Apr 2022

Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano

Electrical and Computer Engineering ETDs

Due to the increasing use of photovoltaic systems, power grids are vulnerable to the projection of shadows from moving clouds. An intra-hour solar forecast provides power grids with the capability of automatically controlling the dispatch of energy, reducing the additional cost for a guaranteed, reliable supply of energy (i.e., energy storage). This dissertation introduces a novel sky imager consisting of a long-wave radiometric infrared camera and a visible light camera with a fisheye lens. The imager is mounted on a solar tracker to maintain the Sun in the center of the images throughout the day, reducing the scattering effect produced …


Session 5: Equipment Finance Credit Risk Modeling - A Case Study In Creative Model Development & Nimble Data Engineering, Edward Krueger, Landon Thompson, Josh Moore Feb 2022

Session 5: Equipment Finance Credit Risk Modeling - A Case Study In Creative Model Development & Nimble Data Engineering, Edward Krueger, Landon Thompson, Josh Moore

SDSU Data Science Symposium

This presentation will focus first on providing an overview of Channel and the Risk Analytics team that performed this case study. Given that context, we’ll then dive into our approach for building the modeling development data set, techniques and tools used to develop and implement the model into a production environment, and some of the challenges faced upon launch. Then, the presentation will pivot to the data engineering pipeline. During this portion, we will explore the application process and what happens to the data we collect. This will include how we extract & store the data along with how it …


Transition Metal Phosphides For High Performance Electrochemical Energy Storage Devices, Amina Saleh Jan 2022

Transition Metal Phosphides For High Performance Electrochemical Energy Storage Devices, Amina Saleh

Theses and Dissertations

Electrochemical energy storage technologies are nowadays playing a leading role in the global effort to address the energy challenges. A lot of attention has been devoted to designing hybrid devices known as supercapatteries which combine the merits of supercapacitors (high power density) and rechargeable batteries (high energy density). Transition metal phosphides (TMP) are a rising star for supercapattery anode materials thanks to their high conductivity, metalloid characteristics, and kinetic favorability for fast electron transport. Herein, new TMP-based materials were synthesized for use as supercapattery positive electrodes, via a multifaceted approach to yield devices enjoying concurrently high power and energy densities. …


Finding The Best Predictors For Foot Traffic In Us Seafood Restaurants, Isabel Paige Beaulieu Jan 2022

Finding The Best Predictors For Foot Traffic In Us Seafood Restaurants, Isabel Paige Beaulieu

Honors Theses and Capstones

COVID-19 caused state and nation-wide lockdowns, which altered human foot traffic, especially in restaurants. The seafood sector in particular suffered greatly as there was an increase in illegal fishing, it is made up of perishable goods, it is seasonal in some places, and imports and exports were slowed. Foot traffic data is useful for business owners to have to know how much to order, how many employees to schedule, etc. One issue is that the data is very expensive, hard to get, and not available until months after it is recorded. Our goal is to not only find covariates that …


Realtime Event Detection In Sports Sensor Data With Machine Learning, Mallory Cashman Jan 2022

Realtime Event Detection In Sports Sensor Data With Machine Learning, Mallory Cashman

Honors Theses and Capstones

Machine learning models can be trained to classify time series based sports motion data, without reliance on assumptions about the capabilities of the users or sensors. This can be applied to predict the count of occurrences of an event in a time period. The experiment for this research uses lacrosse data, collected in partnership with SPAITR - a UNH undergraduate startup developing motion tracking devices for lacrosse. Decision Tree and Support Vector Machine (SVM) models are trained and perform with high success rates. These models improve upon previous work in human motion event detection and can be used a reference …


Applying Machine Learning Algorithms For Face Mask Detections, Mackenzie Frato Jan 2022

Applying Machine Learning Algorithms For Face Mask Detections, Mackenzie Frato

Williams Honors College, Honors Research Projects

Goal: Apply multiple machine learning techniques to Face Mask images to detect if a student is wear a Face Mask and/or wearing it incorrectly or not at all. Methodology: Use 2-3 different machine learning techniques to develop this program. Will choose these techniques as I research over the semester. The best technique will be the final one used, but many will be explored. Validation techniques will be used to see which is the best technique. Timeline: Choose Dataset - October 1st, Choose techniques - October 31st, Research techniques/validation - November 31st, Begin writing code - December 13th, Finish code - …


Statistical Theory For Specialized Linear Regression Adjustment Methods Compared To Multiple Linear Regression In The Presence And Absence Of Interaction Effects, Leon Su Jan 2022

Statistical Theory For Specialized Linear Regression Adjustment Methods Compared To Multiple Linear Regression In The Presence And Absence Of Interaction Effects, Leon Su

Theses and Dissertations--Statistics

When building models to investigate outcomes and variables of interest, researchers often want to adjust for other variables. There is a variety of ways that these adjustments are performed. In this work, we will consider four approaches to adjustment utilized by researchers in various fields. We will compare the efficacy of these methods to what we call the ”true model method”, fitting a multiple linear regression model in which adjustment variables are model covariates. Our goal is to show that these adjustment methods have inferior performance to the true model method by comparing model parameter estimates, power, type I error, …


A Monte Carlo Simulation Of Rat Choice Behavior With Interdependent Outcomes, Michelle A. Frankot Jan 2022

A Monte Carlo Simulation Of Rat Choice Behavior With Interdependent Outcomes, Michelle A. Frankot

Graduate Theses, Dissertations, and Problem Reports

Preclinical behavioral neuroscience often uses choice paradigms to capture psychiatric symptoms. In particular, the subfield of operant research produces nested datasets with many discrete choices in a session. The standard analytic practice is to aggregate choice into a continuous variable and analyze using ANOVA or linear regression. However, choice data often have multiple interdependent outcomes of interest, violating an assumption of general linear models. The aim of the current study was to quantify the accuracy of linear mixed-effects regression (LMER) for analyzing data from a 4-choice operant task called the Rodent Gambling Task (RGT), which measures decision-making in the context …