Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Science

PDF

Institution
Keyword
Publication Year
Publication
Publication Type

Articles 1 - 30 of 94

Full-Text Articles in Statistical Models

Reevaluating Texas Energy Market Forecasts In The Wake Of Recent Extreme Weather Events, Robert A. Derner, Richard W. Butler Ii, Alexandria Neff, Adam R. Ruthford May 2024

Reevaluating Texas Energy Market Forecasts In The Wake Of Recent Extreme Weather Events, Robert A. Derner, Richard W. Butler Ii, Alexandria Neff, Adam R. Ruthford

SMU Data Science Review

This paper provides updated forecasts of energy demand in Texas and recognizes the impact of sustainable energy. It is important that the forecasts of the adoption of sustainable energy are reexamined after Winter Storm Uri crippled the Texas power grid and left many without power. This storm highlighted the issues the Texas power grid had and has continued to struggle with in supplying the state with energy. This paper will offer an overview of the relevant literature on the adoption of sustainable energy and relevant events that have occurred in the state of Texas that will give the reader the …


Context Aware Music Recommendation And Playlist Generation, Elias Mann May 2024

Context Aware Music Recommendation And Playlist Generation, Elias Mann

SMU Journal of Undergraduate Research

There are many reasons people listen to music, and the type of music is largely determined by what the listener may be doing while they listen. For example, one may listen to one type of music while commuting, another while exercising, and yet another while relaxing. Without access to the physiological state of the user, current music recommendation methods rely on collaborative filtering - recommending music based on what other similar users listen to - and content based filtering - recommending songs based on their similarities to songs the user already prefers. With the rise in popularity of smart devices …


Code For Care: Hypertension Prediction In Women Aged 18-39 Years, Kruti Sheth May 2024

Code For Care: Hypertension Prediction In Women Aged 18-39 Years, Kruti Sheth

Electronic Theses, Projects, and Dissertations

The longstanding prevalence of hypertension, often undiagnosed, poses significant risks of severe chronic and cardiovascular complications if left untreated. This study investigated the causes and underlying risks of hypertension in females aged between 18-39 years. The research questions were: (Q1.) What factors affect the occurrence of hypertension in females aged 18-39 years? (Q2.) What machine learning algorithms are suited for effectively predicting hypertension? (Q3.) How can SHAP values be leveraged to analyze the factors from model outputs? The findings are: (Q1.) Performing Feature selection using binary classification Logistic regression algorithm reveals an array of 30 most influential factors at an …


A Novel Correction For The Multivariate Ljung-Box Test, Minhao Huang May 2024

A Novel Correction For The Multivariate Ljung-Box Test, Minhao Huang

Computational and Data Sciences (PhD) Dissertations

This research introduces an analytical improvement to the Multivariate Ljung-Box test that addresses significant deviations of the original test from the nominal Type I error rates under almost all scenarios. Prior attempts to mitigate this issue have been directed at modification of the test statistics or correction of the test distribution to achieve precise results in finite samples. In previous studies, focused on designing corrections to the univariate Ljung-Box, a method that specifically adjusts the test rejection region has been the most successful of attaining the best Type I error rates. We adopt the same approach for the more complex, …


Stability Of Quantum Computers, Samudra Dasgupta May 2024

Stability Of Quantum Computers, Samudra Dasgupta

Doctoral Dissertations

Quantum computing's potential is immense, promising super-polynomial reductions in execution time, energy use, and memory requirements compared to classical computers. This technology has the power to revolutionize scientific applications such as simulating many-body quantum systems for molecular structure understanding, factorization of large integers, enhance machine learning, and in the process, disrupt industries like telecommunications, material science, pharmaceuticals and artificial intelligence. However, quantum computing's potential is curtailed by noise, further complicated by non-stationary noise parameter distributions across time and qubits. This dissertation focuses on the persistent issue of noise in quantum computing, particularly non-stationarity of noise parameters in transmon processors. It …


Research On Chinese Data Sovereignty Policy Based On Lda Model And Policy Instruments, Han Qiao, Junru Xu Mar 2024

Research On Chinese Data Sovereignty Policy Based On Lda Model And Policy Instruments, Han Qiao, Junru Xu

Bulletin of Chinese Academy of Sciences (Chinese Version)

Data sovereignty has become an important component of national sovereignty in the dual context of the digital economy development and the overall national security concept. Major countries and regions are actively carrying out data sovereignty strategic deployment and engaging in fierce competition in data resources, data technology, and data rules. This work adopts the policy text analysis method to study China’s data sovereignty policy, and employs the LDA model and policy instruments to quantitatively analyze the process evolution and thematic characteristics of China’s data sovereignty policy. Drawing on these findings, this study comprehensively considers the global data sovereignty policy and …


Making Sense Of Making Parole In New York, Alexandra Mcglinchy Feb 2024

Making Sense Of Making Parole In New York, Alexandra Mcglinchy

Dissertations, Theses, and Capstone Projects

For many individuals incarcerated in New York, the initial step toward freedom begins with an interview with the Board of Parole. This process, however, is frequently a complex and challenging one, characterized by repeated denials and extended incarcerations. The disparity in outcomes – where one individual may receive over 20 denials and another is granted parole on their first attempt – highlights the ambiguity and inconsistency in the parole decision-making process. This project aims to clarify the factors that influence parole decisions by concentrating on measurable variables. These include age, race, duration of sentence served, proportion of sentence served, type …


Modeling Of Covid-19 Clinical Outcomes In Mexico: An Analysis Of Demographic, Clinical, And Chronic Disease Factors, Livia Clarete Feb 2024

Modeling Of Covid-19 Clinical Outcomes In Mexico: An Analysis Of Demographic, Clinical, And Chronic Disease Factors, Livia Clarete

Dissertations, Theses, and Capstone Projects

This study explores COVID-19 clinical outcomes in Mexico, focusing on demographic, clinical, and chronic disease variables to develop predictive models. In the binary classification task, the Ada Boost Classifier distinguishes survivors from non-survivors, with age, sex, ethnicity, and chronic medical conditions influencing outcomes. In multiclass classification, the Gradient Boosting Classifier categorizes patients into outcome groups.

Demographic variables, especially age, are crucial for predicting COVID-19 outcomes for both the binary and multiclass classification tasks. Clinical information about previous conditions, including chronic diseases, also holds relevance, especially diabetes, immunocompromise, and cardiovascular diseases. These insights inform public health measures and healthcare strategies, emphasizing …


Model Selection Through Cross-Validation For Supervised Learning Tasks With Manifold Data, Derek Brown Jan 2024

Model Selection Through Cross-Validation For Supervised Learning Tasks With Manifold Data, Derek Brown

The Journal of Purdue Undergraduate Research

No abstract provided.


Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe Jan 2024

Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe

Data Science and Data Mining

This project estimates a regression model to predict the superconducting critical temperature based on variables extracted from the superconductor’s chemical formula. The regression model along with the stepwise variable selection gives a reasonable and good predictive model with a lower prediction error (MSE). Variables extracted based on atomic radius, valence, atomic mass and thermal conductivity appeared to have the most contribution to the predictive model.


A Bayesian Inversion For Emissions And Export Productivity Across The End-Cretaceous Boundary, Alexander A. Cox Jan 2024

A Bayesian Inversion For Emissions And Export Productivity Across The End-Cretaceous Boundary, Alexander A. Cox

Dartmouth College Master’s Theses

The end-Cretaceous mass extinction was marked by both the Chicxulub impact and the ongoing emplacement of the Deccan Traps flood basalt province. Both of these events perturbed the environment by the emission of climate-active volatiles, primarily CO2 and SO2. To understand the mechanism of extinction, we must disentangle the timing, duration, and intensity of volcanic and meteoritic environmental forcings. In this thesis, we used a parallel Markov chain Monte Carlo approach to invert for the aforementioned volatile emissions, export productivity, and remineralization from 67 to 65 million years ago using the LOSCAR (Long-term Ocean-atmosphere-Sediment CArbon cycle Reservoir) model. The parallel …


Simulation Of Wave Propagation In Granular Particles Using A Discrete Element Model, Syed Tahmid Hussan Jan 2024

Simulation Of Wave Propagation In Granular Particles Using A Discrete Element Model, Syed Tahmid Hussan

Electronic Theses and Dissertations

The understanding of Bender Element mechanism and utilization of Particle Flow Code (PFC) to simulate the seismic wave behavior is important to test the dynamic behavior of soil particles. Both discrete and finite element methods can be used to simulate wave behavior. However, Discrete Element Method (DEM) is mostly suitable, as the micro scaled soil particle cannot be fully considered as continuous specimen like a piece of rod or aluminum. Recently DEM has been widely used to study mechanical properties of soils at particle level considering the particles as balls. This study represents a comparative analysis of Voigt and Best …


Machine Learning Approaches For Cyberbullying Detection, Roland Fiagbe Jan 2024

Machine Learning Approaches For Cyberbullying Detection, Roland Fiagbe

Data Science and Data Mining

Cyberbullying refers to the act of bullying using electronic means and the internet. In recent years, this act has been identifed to be a major problem among young people and even adults. It can negatively impact one’s emotions and lead to adverse outcomes like depression, anxiety, harassment, and suicide, among others. This has led to the need to employ machine learning techniques to automatically detect cyberbullying and prevent them on various social media platforms. In this study, we want to analyze the combination of some Natural Language Processing (NLP) algorithms (such as Bag-of-Words and TFIDF) with some popular machine learning …


Statistical Modeling Of Bankruptcy Data, Andrew Elsfelder Jan 2024

Statistical Modeling Of Bankruptcy Data, Andrew Elsfelder

Williams Honors College, Honors Research Projects

My project uses a dataset of bankrupt and non-bankrupt companies in Taiwan from 1999 to 2009. This data was collected from the Taiwan Economic Journal. The statistical methods I used to model the data are CHAID, CART, and logistic regression. The models created are tools that can predict if a company is bankrupt, or not-bankrupt based on other data about the company. I created multiple models for each of the methods to find the best model for each method. I then analyzed the output from each method. Lastly, I determined which model was the best for this data based on …


Reducing Food Scarcity: The Benefits Of Urban Farming, S.A. Claudell, Emilio Mejia Dec 2023

Reducing Food Scarcity: The Benefits Of Urban Farming, S.A. Claudell, Emilio Mejia

Journal of Nonprofit Innovation

Urban farming can enhance the lives of communities and help reduce food scarcity. This paper presents a conceptual prototype of an efficient urban farming community that can be scaled for a single apartment building or an entire community across all global geoeconomics regions, including densely populated cities and rural, developing towns and communities. When deployed in coordination with smart crop choices, local farm support, and efficient transportation then the result isn’t just sustainability, but also increasing fresh produce accessibility, optimizing nutritional value, eliminating the use of ‘forever chemicals’, reducing transportation costs, and fostering global environmental benefits.

Imagine Doris, who is …


Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre Dec 2023

Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre

SMU Data Science Review

Hair is found in over 90% of crime scenes and has long been analyzed as trace evidence. However, recent reviews of traditional hair fiber analysis techniques, primarily morphological examination, have cast doubt on its reliability. To address these concerns, this study employed machine learning algorithms, specifically Linear Discriminant Analysis (LDA) and Random Forest, on Direct Analysis in Real Time time-of-flight mass spectra collected from human, cat, and dog hair samples. The objective was to develop a chemistry- and statistics-based classification method for unbiased taxonomic identification of hair. The results of the study showed that LDA and Random Forest were highly …


Exploration And Statistical Modeling Of Profit, Caleb Gibson Dec 2023

Exploration And Statistical Modeling Of Profit, Caleb Gibson

Undergraduate Honors Theses

For any company involved in sales, maximization of profit is the driving force that guides all decision-making. Many factors can influence how profitable a company can be, including external factors like changes in inflation or consumer demand or internal factors like pricing and product cost. Understanding specific trends in one's own internal data, a company can readily identify problem areas or potential growth opportunities to help increase profitability.

In this discussion, we use an extensive data set to examine how a company might analyze their own data to identify potential changes the company might investigate to drive better performance. Based …


The Impacts Of The Covid-19 Pandemic On Mental Health Across Different Genders And Sexualities, Jiale Zhu, Jonas Katona Nov 2023

The Impacts Of The Covid-19 Pandemic On Mental Health Across Different Genders And Sexualities, Jiale Zhu, Jonas Katona

Undergraduate Research Journal for the Human Sciences

Current studies report an increase in psychological distress as a result of the COVID-19 pandemic. This study is interested in examining mental health disparities and how the COVID-19 pandemic has disproportionately impacted marginalized groups—and more specifically, those identified by sex, gender, and sexuality—compared with the general population. This study also considers the effects and ramifications of different policy measures taken during the course of the pandemic. We perform exploratory data modeling and analysis on several important and publicly available datasets taken during the pandemic on mental health and COVID-19 infection data across various identity groups to look for significant disparities, …


The Double Edged Sword Of The Pandemic: Exploring Associations Between Covid-19 And Social Isolation In The Usa, Alexander Fulk Nov 2023

The Double Edged Sword Of The Pandemic: Exploring Associations Between Covid-19 And Social Isolation In The Usa, Alexander Fulk

Annual Symposium on Biomathematics and Ecology Education and Research

No abstract provided.


Mathematical Modeling Of The Impact Of Lobbying On Climate Policy, Andrew Jacoby, Claire Hannah, James Hutchinson, Jasmine Narehood, Aditi Ghosh, Padmanabhan Seshaiyer Nov 2023

Mathematical Modeling Of The Impact Of Lobbying On Climate Policy, Andrew Jacoby, Claire Hannah, James Hutchinson, Jasmine Narehood, Aditi Ghosh, Padmanabhan Seshaiyer

Annual Symposium on Biomathematics and Ecology Education and Research

No abstract provided.


Deep Q-Learning Framework For Quantitative Climate Change Adaptation Policy For Florida Road Network Due To Extreme Precipitation, Orhun Aydin Oct 2023

Deep Q-Learning Framework For Quantitative Climate Change Adaptation Policy For Florida Road Network Due To Extreme Precipitation, Orhun Aydin

I-GUIDE Forum

Climate change-induced extreme weather and increasing population are increasing the pressure on the global aging road networks. Adaptation requires designing interventions and alterations to the road networks that consider future dynamics of flooding and increased traffic due to the growing population. This paper introduces a reinforcement learning approach to designing interventions for Florida's road network under future traffic and climate projections. Three climate models and a tide and surge model are used to create flooding and coastal inundation projections, respectively. The optimal sequence of decisions for adapting Florida's road network to minimize flooding-related disruptions is solved by using a graph-based …


Using Geographic Information To Explore Player-Specific Movement And Its Effects On Play Success In The Nfl, Hayley Horn, Eric Laigaie, Alexander Lopez, Shravan Reddy Aug 2023

Using Geographic Information To Explore Player-Specific Movement And Its Effects On Play Success In The Nfl, Hayley Horn, Eric Laigaie, Alexander Lopez, Shravan Reddy

SMU Data Science Review

American Football is a billion-dollar industry in the United States. The analytical aspect of the sport is an ever-growing domain, with open-source competitions like the NFL Big Data Bowl accelerating this growth. With the amount of player movement during each play, tracking data can prove valuable in many areas of football analytics. While concussion detection, catch recognition, and completion percentage prediction are all existing use cases for this data, player-specific movement attributes, such as speed and agility, may be helpful in predicting play success. This research calculates player-specific speed and agility attributes from tracking data and supplements them with descriptive …


Forecasting Covid-19 With Temporal Hierarchies And Ensemble Methods, Li Shandross Aug 2023

Forecasting Covid-19 With Temporal Hierarchies And Ensemble Methods, Li Shandross

Masters Theses

Infectious disease forecasting efforts underwent rapid growth during the COVID-19 pandemic, providing guidance for pandemic response and about potential future trends. Yet despite their importance, short-term forecasting models often struggled to produce accurate real-time predictions of this complex and rapidly changing system. This gap in accuracy persisted into the pandemic and warrants the exploration and testing of new methods to glean fresh insights.

In this work, we examined the application of the temporal hierarchical forecasting (THieF) methodology to probabilistic forecasts of COVID-19 incident hospital admissions in the United States. THieF is an innovative forecasting technique that aggregates time-series data into …


Addressing The Impact Of Time-Dependent Social Groupings On Animal Survival And Recapture Rates In Mark-Recapture Studies, Alexandru M. Draghici Jun 2023

Addressing The Impact Of Time-Dependent Social Groupings On Animal Survival And Recapture Rates In Mark-Recapture Studies, Alexandru M. Draghici

Electronic Thesis and Dissertation Repository

Mark-recapture (MR) models typically assume that individuals under study have independent survival and recapture outcomes. One such model of interest is known as the Cormack-Jolly-Seber (CJS) model. In this dissertation, we conduct three major research projects focused on studying the impact of violating the independence assumption in MR models along with presenting extensions which relax the independence assumption. In the first project, we conduct a simulation study to address the impact of failing to account for pair-bonded animals having correlated recapture and survival fates on the CJS model. We examined the impact of correlation on the likelihood ratio test (LRT), …


Analytical Approach For Monitoring The Behavior Of Patients With Pancreatic Adenocarcinoma At Different Stages As A Function Of Time, Aditya Chakaborty Dr, Chris P. Tsokos Dr May 2023

Analytical Approach For Monitoring The Behavior Of Patients With Pancreatic Adenocarcinoma At Different Stages As A Function Of Time, Aditya Chakaborty Dr, Chris P. Tsokos Dr

Biology and Medicine Through Mathematics Conference

No abstract provided.


Movie Recommender System Using Matrix Factorization, Roland Fiagbe May 2023

Movie Recommender System Using Matrix Factorization, Roland Fiagbe

Data Science and Data Mining

Recommendation systems are a popular and beneficial field that can help people make informed decisions automatically. This technique assists users in selecting relevant information from an overwhelming amount of available data. When it comes to movie recommendations, two common methods are collaborative filtering, which compares similarities between users, and content-based filtering, which takes a user’s specific preferences into account. However, our study focuses on the collaborative filtering approach, specifically matrix factorization. Various similarity metrics are used to identify user similarities for recommendation purposes. Our project aims to predict movie ratings for unwatched movies using the MovieLens rating dataset. We developed …


A Probabilistic Exploration Of Food Supplementation And Assistance, Logan Mattingly May 2023

A Probabilistic Exploration Of Food Supplementation And Assistance, Logan Mattingly

Honors College Theses

Food insecurity is a stark threat that grips our country and affects households throughout our country. Dietary insufficiency manifests itself in ways that affect health and public safety. According to researchers, individuals who suffer from food insecurity have a higher risk of aggression, anxiety, suicide ideation and depression. These problems tend to occur unequally distributed among those households with lower income. In this work, an exploratory analysis within these data sets will be performed to examine the socio-economic, biographical, nutritional, and geographical principal components of food insecurity among survey participants and how the US Supplemental Nutrition Assistance Program (SNAP) effects …


Multidimensional Investigation Of Tennessee’S Urban Forest, Jillian L. Gorrell May 2023

Multidimensional Investigation Of Tennessee’S Urban Forest, Jillian L. Gorrell

Doctoral Dissertations

Preserving existing trees in urban areas and properly cultivating urban forest conservation and management opportunities is valuable to the ever-growing urban environment and necessary for creating optimal experiences and educational tools to meet the needs of increasing urban populations. This dissertation contains studies investigating several facets of the urban forest, including environmental effects of deforestation and urbanization, tree equity, and urban forest facility management and accessibility. Community education and outreach at arboreta about the importance of the tree canopy can help promote environmental stewardship. A digital questionnaire was electronically distributed to representatives of arboreta certified through the Tennessee Division of …


Quantification Of Various Types Of Biases In Large Language Models, Sudhashree Sayenju Apr 2023

Quantification Of Various Types Of Biases In Large Language Models, Sudhashree Sayenju

Doctor of Data Science and Analytics Dissertations

Natural Language Processing (NLP) systems are included everywhere on the internet from search engines, language translations to more advanced systems like voice assistant and customer service. Since humans are always on the receiving end of NLP technologies, it is very important to analyze whether or not the Large Language Models (LLMs) in use have bias and are therefore unfair. The majority of the research in NLP bias has focused on societal stereotype biases embedded in LLMs. However, our research focuses on all types of biases, namely model class level bias, stereotype bias and domain bias present in LLMs. Model class …


Time Series Analysis Of Longitudinally Collected Standard Autoperimetry Data In Glaucoma Patients, Carlyn Childress Apr 2023

Time Series Analysis Of Longitudinally Collected Standard Autoperimetry Data In Glaucoma Patients, Carlyn Childress

Honors College Theses

Glaucoma is a group of eye diseases in which damage gradually occurs to the optic nerve, which often leads to partial or complete loss of vision. As the second leading cause of blindness, there is no cure for glaucoma. Early detection and the tracking of its progression is key to managing the effects of glaucoma. Ordinary Least Squares Regression (OLSR), the most commonly used methodology for tracking glaucoma progression, is inappropriate as the longitudinally collected perimetry data from the glaucoma patients appears to be temporally correlated. Time series models, that account for temporal correlation, are better methods to analyze Mean …