Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Science

PDF

Theses/Dissertations

Institution
Keyword
Publication Year
Publication

Articles 1 - 30 of 51

Full-Text Articles in Statistical Models

A Novel Correction For The Multivariate Ljung-Box Test, Minhao Huang May 2024

A Novel Correction For The Multivariate Ljung-Box Test, Minhao Huang

Computational and Data Sciences (PhD) Dissertations

This research introduces an analytical improvement to the Multivariate Ljung-Box test that addresses significant deviations of the original test from the nominal Type I error rates under almost all scenarios. Prior attempts to mitigate this issue have been directed at modification of the test statistics or correction of the test distribution to achieve precise results in finite samples. In previous studies, focused on designing corrections to the univariate Ljung-Box, a method that specifically adjusts the test rejection region has been the most successful of attaining the best Type I error rates. We adopt the same approach for the more complex, …


Code For Care: Hypertension Prediction In Women Aged 18-39 Years, Kruti Sheth May 2024

Code For Care: Hypertension Prediction In Women Aged 18-39 Years, Kruti Sheth

Electronic Theses, Projects, and Dissertations

The longstanding prevalence of hypertension, often undiagnosed, poses significant risks of severe chronic and cardiovascular complications if left untreated. This study investigated the causes and underlying risks of hypertension in females aged between 18-39 years. The research questions were: (Q1.) What factors affect the occurrence of hypertension in females aged 18-39 years? (Q2.) What machine learning algorithms are suited for effectively predicting hypertension? (Q3.) How can SHAP values be leveraged to analyze the factors from model outputs? The findings are: (Q1.) Performing Feature selection using binary classification Logistic regression algorithm reveals an array of 30 most influential factors at an …


Modeling Of Covid-19 Clinical Outcomes In Mexico: An Analysis Of Demographic, Clinical, And Chronic Disease Factors, Livia Clarete Feb 2024

Modeling Of Covid-19 Clinical Outcomes In Mexico: An Analysis Of Demographic, Clinical, And Chronic Disease Factors, Livia Clarete

Dissertations, Theses, and Capstone Projects

This study explores COVID-19 clinical outcomes in Mexico, focusing on demographic, clinical, and chronic disease variables to develop predictive models. In the binary classification task, the Ada Boost Classifier distinguishes survivors from non-survivors, with age, sex, ethnicity, and chronic medical conditions influencing outcomes. In multiclass classification, the Gradient Boosting Classifier categorizes patients into outcome groups.

Demographic variables, especially age, are crucial for predicting COVID-19 outcomes for both the binary and multiclass classification tasks. Clinical information about previous conditions, including chronic diseases, also holds relevance, especially diabetes, immunocompromise, and cardiovascular diseases. These insights inform public health measures and healthcare strategies, emphasizing …


Making Sense Of Making Parole In New York, Alexandra Mcglinchy Feb 2024

Making Sense Of Making Parole In New York, Alexandra Mcglinchy

Dissertations, Theses, and Capstone Projects

For many individuals incarcerated in New York, the initial step toward freedom begins with an interview with the Board of Parole. This process, however, is frequently a complex and challenging one, characterized by repeated denials and extended incarcerations. The disparity in outcomes – where one individual may receive over 20 denials and another is granted parole on their first attempt – highlights the ambiguity and inconsistency in the parole decision-making process. This project aims to clarify the factors that influence parole decisions by concentrating on measurable variables. These include age, race, duration of sentence served, proportion of sentence served, type …


A Bayesian Inversion For Emissions And Export Productivity Across The End-Cretaceous Boundary, Alexander A. Cox Jan 2024

A Bayesian Inversion For Emissions And Export Productivity Across The End-Cretaceous Boundary, Alexander A. Cox

Dartmouth College Master’s Theses

The end-Cretaceous mass extinction was marked by both the Chicxulub impact and the ongoing emplacement of the Deccan Traps flood basalt province. Both of these events perturbed the environment by the emission of climate-active volatiles, primarily CO2 and SO2. To understand the mechanism of extinction, we must disentangle the timing, duration, and intensity of volcanic and meteoritic environmental forcings. In this thesis, we used a parallel Markov chain Monte Carlo approach to invert for the aforementioned volatile emissions, export productivity, and remineralization from 67 to 65 million years ago using the LOSCAR (Long-term Ocean-atmosphere-Sediment CArbon cycle Reservoir) model. The parallel …


Simulation Of Wave Propagation In Granular Particles Using A Discrete Element Model, Syed Tahmid Hussan Jan 2024

Simulation Of Wave Propagation In Granular Particles Using A Discrete Element Model, Syed Tahmid Hussan

Electronic Theses and Dissertations

The understanding of Bender Element mechanism and utilization of Particle Flow Code (PFC) to simulate the seismic wave behavior is important to test the dynamic behavior of soil particles. Both discrete and finite element methods can be used to simulate wave behavior. However, Discrete Element Method (DEM) is mostly suitable, as the micro scaled soil particle cannot be fully considered as continuous specimen like a piece of rod or aluminum. Recently DEM has been widely used to study mechanical properties of soils at particle level considering the particles as balls. This study represents a comparative analysis of Voigt and Best …


Exploration And Statistical Modeling Of Profit, Caleb Gibson Dec 2023

Exploration And Statistical Modeling Of Profit, Caleb Gibson

Undergraduate Honors Theses

For any company involved in sales, maximization of profit is the driving force that guides all decision-making. Many factors can influence how profitable a company can be, including external factors like changes in inflation or consumer demand or internal factors like pricing and product cost. Understanding specific trends in one's own internal data, a company can readily identify problem areas or potential growth opportunities to help increase profitability.

In this discussion, we use an extensive data set to examine how a company might analyze their own data to identify potential changes the company might investigate to drive better performance. Based …


Forecasting Covid-19 With Temporal Hierarchies And Ensemble Methods, Li Shandross Aug 2023

Forecasting Covid-19 With Temporal Hierarchies And Ensemble Methods, Li Shandross

Masters Theses

Infectious disease forecasting efforts underwent rapid growth during the COVID-19 pandemic, providing guidance for pandemic response and about potential future trends. Yet despite their importance, short-term forecasting models often struggled to produce accurate real-time predictions of this complex and rapidly changing system. This gap in accuracy persisted into the pandemic and warrants the exploration and testing of new methods to glean fresh insights.

In this work, we examined the application of the temporal hierarchical forecasting (THieF) methodology to probabilistic forecasts of COVID-19 incident hospital admissions in the United States. THieF is an innovative forecasting technique that aggregates time-series data into …


Addressing The Impact Of Time-Dependent Social Groupings On Animal Survival And Recapture Rates In Mark-Recapture Studies, Alexandru M. Draghici Jun 2023

Addressing The Impact Of Time-Dependent Social Groupings On Animal Survival And Recapture Rates In Mark-Recapture Studies, Alexandru M. Draghici

Electronic Thesis and Dissertation Repository

Mark-recapture (MR) models typically assume that individuals under study have independent survival and recapture outcomes. One such model of interest is known as the Cormack-Jolly-Seber (CJS) model. In this dissertation, we conduct three major research projects focused on studying the impact of violating the independence assumption in MR models along with presenting extensions which relax the independence assumption. In the first project, we conduct a simulation study to address the impact of failing to account for pair-bonded animals having correlated recapture and survival fates on the CJS model. We examined the impact of correlation on the likelihood ratio test (LRT), …


A Probabilistic Exploration Of Food Supplementation And Assistance, Logan Mattingly May 2023

A Probabilistic Exploration Of Food Supplementation And Assistance, Logan Mattingly

Honors College Theses

Food insecurity is a stark threat that grips our country and affects households throughout our country. Dietary insufficiency manifests itself in ways that affect health and public safety. According to researchers, individuals who suffer from food insecurity have a higher risk of aggression, anxiety, suicide ideation and depression. These problems tend to occur unequally distributed among those households with lower income. In this work, an exploratory analysis within these data sets will be performed to examine the socio-economic, biographical, nutritional, and geographical principal components of food insecurity among survey participants and how the US Supplemental Nutrition Assistance Program (SNAP) effects …


Multidimensional Investigation Of Tennessee’S Urban Forest, Jillian L. Gorrell May 2023

Multidimensional Investigation Of Tennessee’S Urban Forest, Jillian L. Gorrell

Doctoral Dissertations

Preserving existing trees in urban areas and properly cultivating urban forest conservation and management opportunities is valuable to the ever-growing urban environment and necessary for creating optimal experiences and educational tools to meet the needs of increasing urban populations. This dissertation contains studies investigating several facets of the urban forest, including environmental effects of deforestation and urbanization, tree equity, and urban forest facility management and accessibility. Community education and outreach at arboreta about the importance of the tree canopy can help promote environmental stewardship. A digital questionnaire was electronically distributed to representatives of arboreta certified through the Tennessee Division of …


Quantification Of Various Types Of Biases In Large Language Models, Sudhashree Sayenju Apr 2023

Quantification Of Various Types Of Biases In Large Language Models, Sudhashree Sayenju

Doctor of Data Science and Analytics Dissertations

Natural Language Processing (NLP) systems are included everywhere on the internet from search engines, language translations to more advanced systems like voice assistant and customer service. Since humans are always on the receiving end of NLP technologies, it is very important to analyze whether or not the Large Language Models (LLMs) in use have bias and are therefore unfair. The majority of the research in NLP bias has focused on societal stereotype biases embedded in LLMs. However, our research focuses on all types of biases, namely model class level bias, stereotype bias and domain bias present in LLMs. Model class …


Time Series Analysis Of Longitudinally Collected Standard Autoperimetry Data In Glaucoma Patients, Carlyn Childress Apr 2023

Time Series Analysis Of Longitudinally Collected Standard Autoperimetry Data In Glaucoma Patients, Carlyn Childress

Honors College Theses

Glaucoma is a group of eye diseases in which damage gradually occurs to the optic nerve, which often leads to partial or complete loss of vision. As the second leading cause of blindness, there is no cure for glaucoma. Early detection and the tracking of its progression is key to managing the effects of glaucoma. Ordinary Least Squares Regression (OLSR), the most commonly used methodology for tracking glaucoma progression, is inappropriate as the longitudinally collected perimetry data from the glaucoma patients appears to be temporally correlated. Time series models, that account for temporal correlation, are better methods to analyze Mean …


Analyzing Relationships With Machine Learning, Oscar Ko Feb 2023

Analyzing Relationships With Machine Learning, Oscar Ko

Dissertations, Theses, and Capstone Projects

Procedurally, this project aims to take a dataset, analyze it, and offer insights to the audience in an easy-to-digest format. Conceptually, this project will seek to explore questions like: “Do couples that meet through online dating or dating apps have higher or lower quality relationships?”, “Can any features in this dataset help predict how a subject would rate their relationship quality?”, and “What other insights can I derive from using machine learning for exploratory analysis?” The intended audience for this project is anyone interested in romantic relationships or machine learning.

The dataset is from a Stanford University survey, “How Couples …


Statistical Intervals For Neural Network And Its Relationship With Generalized Linear Model, Sheng Yuan Jan 2023

Statistical Intervals For Neural Network And Its Relationship With Generalized Linear Model, Sheng Yuan

Theses and Dissertations--Statistics

Neural networks have experienced widespread adoption and have become integral in cutting-edge domains like computer vision, natural language processing, and various contemporary fields. However, addressing the statistical aspects of neural networks has been a persistent challenge, with limited satisfactory results. In my research, I focused on exploring statistical intervals applied to neural networks, specifically confidence intervals and tolerance intervals. I employed variance estimation methods, such as direct estimation and resampling, to assess neural networks and their performance under outlier scenarios. Remarkably, when outliers were present, the resampling method with infinitesimal jackknife estimation yielded confidence intervals that closely aligned with nominal …


High Dimensional Data Analysis: Variable Screening And Inference, Lei Fang Jan 2023

High Dimensional Data Analysis: Variable Screening And Inference, Lei Fang

Theses and Dissertations--Statistics

This dissertation focuses on the problem of high dimensional data analysis, which arises in many fields including genomics, finance, and social sciences. In such settings, the number of features or variables is much larger than the number of observations, posing significant challenges to traditional statistical methods.

To address these challenges, this dissertation proposes novel methods for variable screening and inference. The first part of the dissertation focuses on variable screening, which aims to identify a subset of important variables that are strongly associated with the response variable. Specifically, we propose a robust nonparametric screening method to effectively select the predictors …


Application Of Sentiment Analysis And Machine Learning Techniques To Predict Daily Cryptocurrency Price Returns, Edward Wu Jan 2023

Application Of Sentiment Analysis And Machine Learning Techniques To Predict Daily Cryptocurrency Price Returns, Edward Wu

CMC Senior Theses

This paper examines the effects of social media sentiment relating to Bitcoin on the daily price returns of Bitcoin and other popular cryptocurrencies by utilizing sentiment analysis and machine learning techniques to predict daily price returns. Many investors think that social media sentiment affects cryptocurrency prices. However, the results of this paper find that social media sentiment relating to Bitcoin does not add significant predictive value to forecasting daily price returns for each of the six cryptocurrencies used for analysis and that machine learning models that do not assume linearity between the current day price return and previous daily price …


Applications Of Transfer Learning From Malicious To Vulnerable Binaries, Sean Patrick Mcnulty Jan 2023

Applications Of Transfer Learning From Malicious To Vulnerable Binaries, Sean Patrick Mcnulty

Graduate Student Theses, Dissertations, & Professional Papers

Malware detection and vulnerability detection are important cybersecurity tasks. Previous research has successfully applied a variety of machine learning methods to both. However, despite their potential synergies, previous research has yet to unite these two tasks. Given the recent success of transfer learning in many domains, such as language modeling and image recognition, this thesis investigated the use of transfer learning to improve vulnerability detection. Specifically, we pre-trained a series of models to detect malicious binaries and used the weights from those models to kickstart the detection of vulnerable binaries. In our study, we also investigated five different data representations …


Network Intrusion Detection Using Deep Reinforcement Learning, Hamed T. Sanusi Jan 2023

Network Intrusion Detection Using Deep Reinforcement Learning, Hamed T. Sanusi

Electronic Theses and Dissertations

This thesis delves into cybersecurity by applying Deep Reinforcement(DRL) Learning in network intrusion detection. One advantage of DRL is the ability to adapt to changing network conditions and evolving attack methods, making it a promising solution for addressing the challenges involved in intrusion detection. The thesis will also discuss the obstacles and benefits of using Classification methods for network intrusion detection and the need for high-quality training data. To train and test our proposed method, the NSL-KDD dataset was used and then adjusted by converting it from a multi-classification to a binary classification, achieved by joining all attacks into one. …


Understanding Consumers' Use Experience On Electrically Heated Jacket: A Study On Online Review Using Topic Modeling, Md Nakib-Ul Hasan Aug 2022

Understanding Consumers' Use Experience On Electrically Heated Jacket: A Study On Online Review Using Topic Modeling, Md Nakib-Ul Hasan

LSU Doctoral Dissertations

The demand for heated jackets is anticipated to be fuelled by frequent temperature drops, severe winter weather, and increasing outdoor activities. Electrically heated jackets (EHJ) are primarily marketed through online distribution channels and expansion of online sales channels is expected to boost the global market. Consumers are increasingly relying on online reviews from other consumers to help them decide what to buy. Businesses also actively monitor and manage their online reviews to build trust in their brand and make it more likely that customers will buy. Traditional approaches for assessing customer behavior, such as market research surveys and focus groups, …


Computer Aided Diagnosis System For Breast Cancer Using Deep Learning., Asma Baccouche Aug 2022

Computer Aided Diagnosis System For Breast Cancer Using Deep Learning., Asma Baccouche

Electronic Theses and Dissertations

The recent rise of big data technology surrounding the electronic systems and developed toolkits gave birth to new promises for Artificial Intelligence (AI). With the continuous use of data-centric systems and machines in our lives, such as social media, surveys, emails, reports, etc., there is no doubt that data has gained the center of attention by scientists and motivated them to provide more decision-making and operational support systems across multiple domains. With the recent breakthroughs in artificial intelligence, the use of machine learning and deep learning models have achieved remarkable advances in computer vision, ecommerce, cybersecurity, and healthcare. Particularly, numerous …


Statistical Extensions Of Multi-Task Learning With Semiparametric Methods And Task Diagnostics, Nikolay Miller Jun 2022

Statistical Extensions Of Multi-Task Learning With Semiparametric Methods And Task Diagnostics, Nikolay Miller

Mathematics & Statistics ETDs

In this dissertation, I propose new approaches to multi-task learning, inspired by statistical model diagnostics and semiparametric and additive modeling. The newly designed additive multi-task model framework allows for flexible estimation of multi-task parametric and nonparametric effects by using an extension of the backfitting algorithm. Further, I propose new methods for statistical task diagnostics, which allow for the identification and remedy of outlier tasks, based on task-specific performance metrics and their empirical distributions. I perform a deep examination of the well-established multi-task kernel method and achieve theoretical and experimental contributions. Lastly, I propose a two-step modeling approach to multi-task modeling, …


Applications Of Machine Learning Algorithms In Materials Science And Bioinformatics, Mohammed Quazi Jun 2022

Applications Of Machine Learning Algorithms In Materials Science And Bioinformatics, Mohammed Quazi

Mathematics & Statistics ETDs

The piezoelectric response has been a measure of interest in density functional theory (DFT) for micro-electromechanical systems (MEMS) since the inception of MEMS technology. Piezoelectric-based MEMS devices find wide applications in automobiles, mobile phones, healthcare devices, and silicon chips for computers, to name a few. Piezoelectric properties of doped aluminum nitride (AlN) have been under investigation in materials science for piezoelectric thin films because of its wide range of device applicability. In this research using rigorous DFT calculations, high throughput ab-initio simulations for 23 AlN alloys are generated.

This research is the first to report strong enhancements of piezoelectric properties …


Data Ethics: An Investigation Of Data, Algorithms, And Practice, Gabrialla S. Cockerell May 2022

Data Ethics: An Investigation Of Data, Algorithms, And Practice, Gabrialla S. Cockerell

Honors Projects

This paper encompasses an examination of defective data collection, algorithms, and practices that continue to be cycled through society under the illusion that all information is processed uniformly, and technological innovation consistently parallels societal betterment. However, vulnerable communities, typically the impoverished and racially discriminated, get ensnared in these harmful cycles due to their disadvantages. Their hindrances are reflected in their information due to the interconnectedness of data, such as race being highly correlated to wealth, education, and location. However, their information continues to be analyzed with the same measures as populations who are not significantly affected by racial bias. Not …


Impact Of Climate Oscillations/Indices On Hydrological Variables In The Mississippi River Valley Alluvial Aquifer., Meena Raju May 2022

Impact Of Climate Oscillations/Indices On Hydrological Variables In The Mississippi River Valley Alluvial Aquifer., Meena Raju

Theses and Dissertations

The Mississippi River Valley Alluvial Aquifer (MRVAA) is one of the most productive agricultural regions in the United States. The main objectives of this research are to identify long term trends and change points in hydrological variables (streamflow and rainfall), to assess the relationship between hydrological variables, and to evaluate the influence of global climate indices on hydrological variables. Non-parametric tests, MMK and Pettitt’s tests were used to analyze trend and change points. PCC and Streamflow elasticity analysis were used to analyze the relationship between streamflow and rainfall and the sensitivity of streamflow to rainfall changes. PCC and MLR analysis …


Finding A Representative Distribution For The Tail Index Alpha, Α, For Stock Return Data From The New York Stock Exchange, Jett Burns May 2022

Finding A Representative Distribution For The Tail Index Alpha, Α, For Stock Return Data From The New York Stock Exchange, Jett Burns

Electronic Theses and Dissertations

Statistical inference is a tool for creating models that can accurately display real-world events. Special importance is given to the financial methods that model risk and large price movements. A parameter that describes tail heaviness, and risk overall, is α. This research finds a representative distribution that models α. The absolute value of standardized stock returns from the Center for Research on Security Prices are used in this research. The inference is performed using R. Approximations for α are found using the ptsuite package. The GAMLSS package employs maximum likelihood estimation to estimate distribution parameters using the CRSP data. The …


Early-Warning Alert Systems For Financial-Instability Detection: An Hmm-Driven Approach, Xing Gu Apr 2022

Early-Warning Alert Systems For Financial-Instability Detection: An Hmm-Driven Approach, Xing Gu

Electronic Thesis and Dissertation Repository

Regulators’ early intervention is crucial when the financial system is experiencing difficulties. Financial stability must be preserved to avert banks’ bailouts, which hugely drain government's financial resources. Detecting in advance periods of financial crisis entails the development and customisation of accurate and robust quantitative techniques. The goal of this thesis is to construct automated systems via the interplay of various mathematical and statistical methodologies to signal financial instability episodes in the near-term horizon. These signal alerts could provide regulatory bodies with the capacity to initiate appropriate response that will thwart or at least minimise the occurrence of a financial crisis. …


Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano Apr 2022

Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano

Electrical and Computer Engineering ETDs

Due to the increasing use of photovoltaic systems, power grids are vulnerable to the projection of shadows from moving clouds. An intra-hour solar forecast provides power grids with the capability of automatically controlling the dispatch of energy, reducing the additional cost for a guaranteed, reliable supply of energy (i.e., energy storage). This dissertation introduces a novel sky imager consisting of a long-wave radiometric infrared camera and a visible light camera with a fisheye lens. The imager is mounted on a solar tracker to maintain the Sun in the center of the images throughout the day, reducing the scattering effect produced …


Transition Metal Phosphides For High Performance Electrochemical Energy Storage Devices, Amina Saleh Jan 2022

Transition Metal Phosphides For High Performance Electrochemical Energy Storage Devices, Amina Saleh

Theses and Dissertations

Electrochemical energy storage technologies are nowadays playing a leading role in the global effort to address the energy challenges. A lot of attention has been devoted to designing hybrid devices known as supercapatteries which combine the merits of supercapacitors (high power density) and rechargeable batteries (high energy density). Transition metal phosphides (TMP) are a rising star for supercapattery anode materials thanks to their high conductivity, metalloid characteristics, and kinetic favorability for fast electron transport. Herein, new TMP-based materials were synthesized for use as supercapattery positive electrodes, via a multifaceted approach to yield devices enjoying concurrently high power and energy densities. …


Finding The Best Predictors For Foot Traffic In Us Seafood Restaurants, Isabel Paige Beaulieu Jan 2022

Finding The Best Predictors For Foot Traffic In Us Seafood Restaurants, Isabel Paige Beaulieu

Honors Theses and Capstones

COVID-19 caused state and nation-wide lockdowns, which altered human foot traffic, especially in restaurants. The seafood sector in particular suffered greatly as there was an increase in illegal fishing, it is made up of perishable goods, it is seasonal in some places, and imports and exports were slowed. Foot traffic data is useful for business owners to have to know how much to order, how many employees to schedule, etc. One issue is that the data is very expensive, hard to get, and not available until months after it is recorded. Our goal is to not only find covariates that …