Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Science

Conference

Institution
Keyword
Publication Year
Publication
File Type

Articles 1 - 30 of 145

Full-Text Articles in Physical Sciences and Mathematics

Try It Together - Qualitative Coding With Atlas.Ti, Danping Dong, Bryan Leow May 2024

Try It Together - Qualitative Coding With Atlas.Ti, Danping Dong, Bryan Leow

AI for Research Week

This hands-on session introduces Atlas.ti, a well-established qualitative data analysis tool for analyzing your transcripts and textual data. The session will cover coding data, extracting insights, creating visualizations, and exploring the tool's latest AI features.


Try It Together: Transcribing Your Audio With Whisper Api, Bella Ratmelia May 2024

Try It Together: Transcribing Your Audio With Whisper Api, Bella Ratmelia

AI for Research Week

In this hands-on session, we will explore using the Whisper API to transcribe audio recordings from interviews, focus groups, and speeches. The session will delve into best practices and address common issues that may arise during the transcription process.


Academic Search And Discovery Tools In The Age Of Ai And Large Language Models: An Overview Of The Space, Aaron Tay May 2024

Academic Search And Discovery Tools In The Age Of Ai And Large Language Models: An Overview Of The Space, Aaron Tay

AI for Research Week

In the ever-evolving landscape of academic research, “AI tools” for literature search and synthesis are currently getting a lot of attention. These tools promise to ramp up productivity, enabling us to accomplish more in less time or absorb more knowledge without drowning in endless reading. With the sheer number of these systems increasing daily, it's natural to wonder: are they really worth our time and money? And if they are, how should we go about picking the right one from the multitude of options?

In this talk, I will share my views on how the space has developed over two …


Dual-Domain Clustering Of Spatiotemporal Infectious Disease Data, Samuel R. Thornton, Erin C.S. Acquesta, Patrick D. Finley, Mansoor A. Haider May 2024

Dual-Domain Clustering Of Spatiotemporal Infectious Disease Data, Samuel R. Thornton, Erin C.S. Acquesta, Patrick D. Finley, Mansoor A. Haider

Biology and Medicine Through Mathematics Conference

No abstract provided.


The Classification Of Internet Memes Through Supervised And Unsupervised Machine Learning Algorithms, William H. Little May 2024

The Classification Of Internet Memes Through Supervised And Unsupervised Machine Learning Algorithms, William H. Little

Symposium of Student Scholars

Memes, those captivating internet phenomena, effortlessly deliver online entertainment. By leveraging time-series data from Google Trends, we can vividly illustrate and dissect the dynamic trends in meme popularity. Previous studies have discerned four distinct post-peak popularity patterns— "smoothly decaying," "spikey decaying," "leveling off," and "long-term growth"—and elegantly modeled these using ordinary differential equations.

This research introduces a programmatic approach that harnesses both supervised and unsupervised machine learning algorithms. The dataset, now expanded to over 2000 elements, becomes the canvas for exploration. The K-means algorithm identifies clusters, which then serve as labels for the supervised SVC algorithm. The overarching goal is …


Visualizing Nfl Player Metrics, Jayson Rhea Apr 2024

Visualizing Nfl Player Metrics, Jayson Rhea

Campus Research Day

This project is dedicated to reshaping the exploration of NFL player data. Tailored for sports analysts and fantasy football managers, the goal is to deliver convenience through seamless data navigation and precise filtering through an interactive dashboard. In contrast to the static formats found on the NFL website and ESPN, this dynamic interface offers interactive visualizations, empowering users to effortlessly compare data. These comparisons can be used draw quick conclusions about player performance.


Dashboard To Quickly Estimate The Cost And Duration Of An Nyc Green Taxi Trip, Isaac Braun Apr 2024

Dashboard To Quickly Estimate The Cost And Duration Of An Nyc Green Taxi Trip, Isaac Braun

Campus Research Day

Before hailing a New York City (NYC) taxi, residents and tourists do not easily know how much the trip will cost them or how long it may take. Taxis are still heavily used, even with the increase of ride-hailing services like Uber, and a new system has yet to be built to provide customers with these two metrics before taking a trip. This project aims to give riders a quick way to estimate a ride’s cost and duration through an interactive dashboard that allows filtering by pickup and drop-off neighborhoods. This is accomplished by analyzing three years of public data …


Binder, Tyler A. Peaster, Lindsey M. Davenport, Madelyn Little, Alex Bales Apr 2024

Binder, Tyler A. Peaster, Lindsey M. Davenport, Madelyn Little, Alex Bales

ATU Research Symposium

Binder is a mobile application that aims to introduce readers to a book recommendation service that appeals to devoted and casual readers. The main goal of Binder is to enrich book selection and reading experience. This project was created in response to deficiencies in the mobile space for book suggestions, library management, and reading personalization. The tools we used to create the project include Visual Studio, .Net Maui Framework, C#, XAML, CSS, MongoDB, NoSQL, Git, GitHub, and Figma. The project’s selection of books were sourced from the Google Books repository. Binder aims to provide an intuitive interface that allows users …


Techniques To Detect Fake Profiles On Social Media Using The New Age Algorithms – A Survey, A K M Rubaiyat Reza Habib, Edidiong Elijah Akpan Apr 2024

Techniques To Detect Fake Profiles On Social Media Using The New Age Algorithms – A Survey, A K M Rubaiyat Reza Habib, Edidiong Elijah Akpan

ATU Research Symposium

This research explores the growing issue of fake accounts in Online Social Networks [OSNs]. While platforms like Twitter, Instagram, and Facebook foster connections, their lax authentication measures have attracted many scammers and cybercriminals. Fake profiles conduct malicious activities, such as phishing, spreading misinformation, and inciting social discord. The consequences range from cyberbullying to deceptive commercial practices. Detecting fake profiles manually is often challenging and causes considerable stress and trust issues for the users. Typically, a social media user scrutinizes various elements like the profile picture, bio, and shared posts to identify fake profiles. These evaluations sometimes lead users to conclude …


Accessing Advanced National Supercomputing And Storage Resources For Computational Research, Ramazan Aygun Apr 2024

Accessing Advanced National Supercomputing And Storage Resources For Computational Research, Ramazan Aygun

All Things Open

This presentation will cover ACCESS (Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support), and Kennesaw State University's involvement in Open Science Data Federation program as a data origin to help researchers and educators with or without supporting grants to utilize the nation’s advanced computing systems and services. ACCESS, a program established and funded by the National Science Foundation, is an ecosystem with capabilities for new modes of research and further democratizing participation. The presentation covers how to apply for allocations on ACCESS. The last part of the presentation will briefly explain Open Science Data Federation and Kennesaw State University's involvement as …


The Vulnerabilities Of Artificial Intelligence Models And Potential Defenses, Felix Iov Apr 2024

The Vulnerabilities Of Artificial Intelligence Models And Potential Defenses, Felix Iov

Cybersecurity Undergraduate Research Showcase

The rapid integration of artificial intelligence (AI) into various commercial products has raised concerns about the security risks posed by adversarial attacks. These attacks manipulate input data to disrupt the functioning of AI models, potentially leading to severe consequences such as self-driving car crashes, financial losses, or data breaches. We will explore neural networks, their weaknesses, and potential defenses. We will discuss adversarial attacks including data poisoning, backdoor attacks, evasion attacks, and prompt injection. Then, we will explore defense strategies such as data protection, input sanitization, and adversarial training. By understanding how adversarial attacks work and the defenses against them, …


Localized Collocation Meshless Method For Modeling Transdermal Pharmacokinetics In Multiphase Skin Structures, Eduardo Divo Apr 2024

Localized Collocation Meshless Method For Modeling Transdermal Pharmacokinetics In Multiphase Skin Structures, Eduardo Divo

Math Department Colloquium Series

The human skin has a complicated structure with many multi-scale, biophysical effects impacting the propagation of skin-injected substances, such as partitioning, metabolic reactions, adsorption and elimination. An extended version of Fick’s second law governing the process of the compound diffusion in various skin layer is employed in the current work by considering the conservation of mass of the substance and the metabolic reaction of the substance in viable skin. Additionally, a model assuming linear coupling between the substance concentrations that are bound and unbound with blood was developed. Using such a model, a set of coupled partial differential equations are …


Improving Educational Delivery And Content In Juvenile Detention Centers, Yomna Elmousalami Mar 2024

Improving Educational Delivery And Content In Juvenile Detention Centers, Yomna Elmousalami

Undergraduate Research Symposium

Students in juvenile detention centers have the greatest need to receive improvements in educational delivery and content; however, they are one of the “truly disadvantaged” populations in terms of receiving those improvements. This work presents a qualitative data analysis based on a focus group meeting with stakeholders at a local Juvenile Detention Center. The current educational system in juvenile detention centers is based on paper worksheets, single-room style teaching methods, outdated technology, and a shortage of textbooks and teachers. In addition, detained students typically have behavioral challenges that are deemed "undesired" in society. As a result, many students miss classes …


Historical Perspectives In Volatility Forecasting Methods With Machine Learning, Zhiang Qiu, Clemens Kownatzki, Fabien Scalzo, Eun Sang Cha Mar 2024

Historical Perspectives In Volatility Forecasting Methods With Machine Learning, Zhiang Qiu, Clemens Kownatzki, Fabien Scalzo, Eun Sang Cha

Seaver College Research And Scholarly Achievement Symposium

Volatility forecasting in the financial market plays a pivotal role across a spectrum of disciplines, such as risk management, option pricing, and market making. However, volatility forecasting is challenging because volatility can only be estimated, and different factors influence volatility, ranging from macroeconomic indicators to investor sentiments. While recent works suggest advances in machine learning and artificial intelligence for volatility forecasting, a comprehensive benchmark of current statistical and learning-based methods for such purposes is lacking. Thus, this paper aims to provide a comprehensive survey of the historical evolution of volatility forecasting with a comparative benchmark of key landmark models. We …


Machine Learning Prediction Of Photoluminescence In Mos2: Challenges In Data Acquisition And A Solution Via Improved Crystal Synthesis, Ethan Swonger, John Mann, Jared Horstmann, Daniel Yang Mar 2024

Machine Learning Prediction Of Photoluminescence In Mos2: Challenges In Data Acquisition And A Solution Via Improved Crystal Synthesis, Ethan Swonger, John Mann, Jared Horstmann, Daniel Yang

Seaver College Research And Scholarly Achievement Symposium

Transition metal dichalcogenides (TMDCs) like molybdenum disulfide (MoS2) possess unique electronic and optical properties, making them promising materials for nanotechnology. Photoluminescence (PL) is a key indicator of MoS2 crystal quality. This study aimed to develop a machine-learning model capable of predicting the peak PL wavelength of single MoS2 crystals based on micrograph analysis. Our limited ability to consistently synthesize high-quality MoS2 crystals hampered our ability to create a large set of training data. The project focus shifted towards improving MoS2 crystal synthesis to generate improved training data. We implemented a novel approach utilizing low-pressure chemical vapor deposition (LPCVD) combined with …


Deep Learning Can Be Used To Classify And Segment Plant Cell Types In Xylem Tissue, Reem Al Dabagh, Benjamin Shin, Sean Wu, Fabien Scalzo, Helen Holmlund, Jessica Lee, Chris Ghim, Samuel Fitzgerald, Marinna Grijalva Mar 2024

Deep Learning Can Be Used To Classify And Segment Plant Cell Types In Xylem Tissue, Reem Al Dabagh, Benjamin Shin, Sean Wu, Fabien Scalzo, Helen Holmlund, Jessica Lee, Chris Ghim, Samuel Fitzgerald, Marinna Grijalva

Seaver College Research And Scholarly Achievement Symposium

Studies of plant anatomical traits are essential for understanding plant physiological adaptations to stressful environments. For example, shrubs in the chaparral ecosystem of southern California have adapted various xylem anatomical traits that help them survive drought and freezing. Previous studies have shown that xylem conduits with a narrow diameter allows certain chaparral shrub species to survive temperatures as low as -12 C. Other studies have shown that increased cell wall thickness of fibers surrounding xylem vessels improves resistance to water stress-induced embolism formation. Historically, these studies on xylem anatomical traits have relied on hand measurements of cells in light micrographs, …


Mechanistic Investigation Of C—C Bond Activation Of Phosphaalkynes With Pt(0) Complexes, Roberto M. Escobar, Abdurrahman C. Ateşin, Christian Müller, William D. Jones, Tülay Ateşin Mar 2024

Mechanistic Investigation Of C—C Bond Activation Of Phosphaalkynes With Pt(0) Complexes, Roberto M. Escobar, Abdurrahman C. Ateşin, Christian Müller, William D. Jones, Tülay Ateşin

Research Symposium

Carbon–carbon (C–C) bond activation has gained increased attention as a direct method for the synthesis of pharmaceuticals. Due to the thermodynamic stability and kinetic inaccessibility of the C–C bonds, however, activation of C–C bonds by homogeneous transition-metal catalysts under mild homogeneous conditions is still a challenge. Most of the systems in which the activation occurs either have aromatization or relief of ring strain as the primary driving force. The activation of unstrained C–C bonds of phosphaalkynes does not have this advantage. This study employs Density Functional Theory (DFT) calculations to elucidate Pt(0)-mediated C–CP bond activation mechanisms in phosphaalkynes. Investigating the …


Online Class-Incremental Learning For Real-World Food Image Classification, Siddeshwar Raghavan, Jiangpeng He, Fengqing Zhu Mar 2024

Online Class-Incremental Learning For Real-World Food Image Classification, Siddeshwar Raghavan, Jiangpeng He, Fengqing Zhu

Graduate Industrial Research Symposium

Food image classification is essential for monitoring health and tracking dietary in image-based dietary assessment methods. However, conventional systems often rely on static datasets with fixed classes and uniform distribution. In contrast, real-world food consumption patterns, shaped by cultural, economic, and personal influences, involve dynamic and evolving data. Thus, it requires the classification system to cope with continuously evolving data. Online Class Incremental Learning (OCIL) addresses the challenge of learning continuously from a single-pass data stream while adapting to the new knowledge and reducing catastrophic forgetting. Experience Replay (ER) based OCIL methods store a small portion of previous data and …


Characterization Of Biological Particles Using An Integrated Hyperspectral Imaging And Machine Learning, Kaeul Lim, Arezoo Ardekani Mar 2024

Characterization Of Biological Particles Using An Integrated Hyperspectral Imaging And Machine Learning, Kaeul Lim, Arezoo Ardekani

Graduate Industrial Research Symposium

Hyperspectral imaging (HSI) is a promising modality in medicine with many potential applications. This study focuses on developing a label-free lipid nanoparticle characterization method using a convolutional neural network (CNN) analysis of HSI images. The HSI data, hypercube, consists of a series of images acquired at different wavelengths for the same field of view, providing continuous spectra information for each pixel. Three distinct liposome samples were collected for analysis. Advanced image preprocessing and classification methods for HSI data were developed to differentiate liposomes based on their material compositions. Our machine learning-based classification method was able to distinguish different liposome types …


Geospatial Analysis Of Agricultural Potential In The United States, Diana Febrita Mar 2024

Geospatial Analysis Of Agricultural Potential In The United States, Diana Febrita

Graduate Industrial Research Symposium

Traditionally, the agriculture sector is responsible for providing food and crop products. However, the role of agriculture has expanded beyond its traditional function. It is the main sector that contributes to the provision of food, income, employment, environmental protection, and local economic development. Reflecting on the roles of agriculture, understanding the potential of agriculture in the United States is crucial to discovering the prospects and challenges. This study will briefly discuss the agricultural potential in the United States based on the five assets, including natural capital, financial capital, human capital, physical capital, and social capital. To identify the states with …


A Machine Learning Model Of Perturb-Seq Data For Use In Space Flight Gene Expression Profile Analysis, Liam F. Johnson, James Casaletto, Lauren Sanders, Sylvain Costes Mar 2024

A Machine Learning Model Of Perturb-Seq Data For Use In Space Flight Gene Expression Profile Analysis, Liam F. Johnson, James Casaletto, Lauren Sanders, Sylvain Costes

Graduate Industrial Research Symposium

The genetic perturbations caused by spaceflight on biological systems tend to have a system-wide effect which is often difficult to deconvolute it into individual signals with specific points of origin. Single cell multi-omic data can provide a profile of the perturbational effects, but does not necessarily indicate the initial point of interference within the network. The objective of this project is to take advantage of large scale and genome-wide perturbational datasets by using them to train a tuned machine learning model that is capable of predicting the effects of unseen perturbations in new data. Perturb-Seq datasets are large libraries of …


Modelling The "Bottom-Up" Development Pattern Of Tar Spot Disease In Corn, Brenden Lane, Joaquín Guillermo Ramírez-Gil, Carlos Góngora-Canul, Mariela Sofia Fernandez Campos, Andres Cruz-Sancan, Fidel E. Jiménez-Beitia, Alex G. Acosta-Guatemal, Wily Sic, C. D. Cruz Mar 2024

Modelling The "Bottom-Up" Development Pattern Of Tar Spot Disease In Corn, Brenden Lane, Joaquín Guillermo Ramírez-Gil, Carlos Góngora-Canul, Mariela Sofia Fernandez Campos, Andres Cruz-Sancan, Fidel E. Jiménez-Beitia, Alex G. Acosta-Guatemal, Wily Sic, C. D. Cruz

Graduate Industrial Research Symposium

In 2015, the corn-infecting pathogen Phyllachora maydis (causal agent of tar spot disease) was reported for the first time in the United States. The disease has since spread across the US, causing major yield losses. In 2021 alone, 5.88 million metric tons (231.3 million bushels) of US corn yield were lost to this disease, costing an estimated US$1.25 billion. Though fungicides can protect against these agroeconomic losses, application timing can be difficult to optimize because our understanding of tar spot dynamics is still evolving. The current view is that tar spot typically develops bottom-up through a repeating infection cycle. Because …


Sepsis Treatment: Reinforced Sequential Decision-Making For Saving Lives, Dipesh Tamboli, Jiayu Chen, Kiran Pranesh Jotheeswaran, Denny Yu, Vaneet Aggarwal Mar 2024

Sepsis Treatment: Reinforced Sequential Decision-Making For Saving Lives, Dipesh Tamboli, Jiayu Chen, Kiran Pranesh Jotheeswaran, Denny Yu, Vaneet Aggarwal

Graduate Industrial Research Symposium

Sepsis, a life-threatening condition triggered by the body's exaggerated response to infection, demands urgent intervention to prevent severe complications. Existing machine learning methods for managing sepsis struggle in offline scenarios, exhibiting suboptimal performance with survival rates below 50%. Our project introduces the "PosNegDM: Reinforcement Learning with Positive and Negative Demonstrations for Sequential Decision-Making" framework utilizing an innovative transformer-based model and a feedback reinforcer to replicate expert actions while considering individual patient characteristics. A mortality classifier with 96.7% accuracy guides treatment decisions towards positive outcomes. The PosNegDM framework significantly improves patient survival, saving 97.39% of patients and outperforming established machine learning …


Accuracy Of Nitrate Hysteresis And Flushing For Agricultural Watersheds In The Midwest, Noah Rudko, Sara K. W. Mcmillian, Jane Frankenberger, François Birgand Mar 2024

Accuracy Of Nitrate Hysteresis And Flushing For Agricultural Watersheds In The Midwest, Noah Rudko, Sara K. W. Mcmillian, Jane Frankenberger, François Birgand

Graduate Industrial Research Symposium

Storm event-based metrics, such as hysteresis (HI) and flushing (FI), are used to differentiate nitrate pathways and sources, which is essential for watershed management. Estimations of these event-based metrics typically use high frequency (15-minute – hourly) measurements, but daily data are also used due to their greater availability. To date, there has been no study assessing how using lower frequency samples affect the accuracy of HI and FI, which could skew interpretation of potential nutrient pathways and sources. We used continuous measurements of nitrate collected at 9 watersheds throughout the Midwest spanning 448 storms. HI and FI were estimated from …


Resource Optimization For Air Mobility Under Emergency Situations, Yongxin (Jack) Liu Mar 2024

Resource Optimization For Air Mobility Under Emergency Situations, Yongxin (Jack) Liu

Math Department Colloquium Series

This project aims to improve air traffic management in emergencies. We first developed a GRU neural network to forecast weather-related airport capacity constraints using historical data, underscoring the value of real-time data analysis. We then optimized emergency evacuation air travel using Particle Swarm Optimization, demonstrating the ability to quickly aggregate evacuation flight resources cost-effectively. Finally, we provided a hybrid model combining a genetic algorithm with a neural network for evacuation planning, we show that neural network can be integrated accelerate genetic algorithms for efficient and performance assured system optimization.


Assessing Gait Metrics For Early Parkinson's Disease Prediction: A Preliminary Analysis Of Underfit Models, Daniel Salinas, Gerardo Medellin, Katherine Bolado, Tomas Gomez, Kelsey Potter-Baker, Nawaz Khan Abdul Hack, Ramu Vadukapuram Mar 2024

Assessing Gait Metrics For Early Parkinson's Disease Prediction: A Preliminary Analysis Of Underfit Models, Daniel Salinas, Gerardo Medellin, Katherine Bolado, Tomas Gomez, Kelsey Potter-Baker, Nawaz Khan Abdul Hack, Ramu Vadukapuram

Research Symposium

Background: Parkinson's Disease (PD) is characterized by both motor and non-motor symptoms, and its diagnosis primarily relies on clinical presentation. There is a growing need for diagnostic tools to identify the early signs of PD, particularly the initial motor impairments often manifested as gait abnormalities. Here we seek to present preliminary findings to address this need. Our study focuses on using Machine Learning techniques (ML) to predict the PD clinical stage most efficiently and accurately. Specifically, we have sought to evaluate how spatiotemporal characteristics and other locomotor performance variables obtained on a walkway system can be utilized to identify the …


Transfer Learning In The Era Of Foundational Models: Application To Diagnosis In Rheumatology, Prashant Shekhar Feb 2024

Transfer Learning In The Era Of Foundational Models: Application To Diagnosis In Rheumatology, Prashant Shekhar

Math Department Colloquium Series

Problems with current synovitis grading procedures

  • There has been a lack of reliability in grading these images in the medical community due to a lack of universally accepted diagnostic criteria [Momtazmanesh et al., 2022]
  • The human/machine variability creates an additional challenge in an efficient automated scoring system [Ranganath et al., 2022]
  • There is a lack of consistency between doctors in grading these images [Momtazmanesh et al., 2022]


Session 8: Machine Learning Based Behavior Of Non-Opec Global Supply In Crude Oil Price Determinism, Mofe Jeje Feb 2024

Session 8: Machine Learning Based Behavior Of Non-Opec Global Supply In Crude Oil Price Determinism, Mofe Jeje

SDSU Data Science Symposium

Abstract

While studies on global oil price variability, occasioned by OPEC crude oil supply, is well documented in energy literature; the impact assessment of non-OPEC global oil supply on price variability, on the other hand, has not received commensurate attention. Given this gap, the primary objective of this study, therefore, is to estimate the magnitude of oil price determinism that is explained by the share of non-OPEC’s global crude oil supply. Using secondary sources of data collection method, data for target variable will be collected from the US Federal Reserve, as it relates to annual crude oil price variability, while …


Predicting Crop Yield Using Remote Sensing Data, Mary Row, Jung-Han Kimn, Hossein Moradi Feb 2024

Predicting Crop Yield Using Remote Sensing Data, Mary Row, Jung-Han Kimn, Hossein Moradi

SDSU Data Science Symposium

Accurate crop yield predictions can help farmers make adjustments or changes in their farming practices to optimize their harvest. Remote sensing data is an inexpensive approach to collecting massive amounts of data that could be utilized for predicting crop yield. This study employed linear regression and spatial linear models were used to predict soybean yield with data from Landsat 8 OLI. Each model was built using only spectral bands of the satellite, only vegetation indices, and both spectral bands and vegetation indices. All analysis was based on data collected from two fields in South Dakota from the 2019 and 2021 …


Principal Component Analysis With Application To Credit Card Data, Eleanor Cain, Semhar Michael, Gary Hatfield Feb 2024

Principal Component Analysis With Application To Credit Card Data, Eleanor Cain, Semhar Michael, Gary Hatfield

SDSU Data Science Symposium

Principal Component Analysis (PCA) is a type of dimension reduction technique used in data analysis to process the data before making a model. In general, dimension reduction allows analysts to make conclusions about large data sets by reducing the number of variables while retaining as much information as possible. Using the numerical variables from a data set, PCA aims to compute a smaller set of uncorrelated variables, called principal components, that account for a majority of the variability from the data. The purpose of this poster is to understand PCA as well as perform PCA on a large sample credit …