Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

1,477 Full-Text Articles 2,954 Authors 434,485 Downloads 189 Institutions

All Articles in Data Science

Faceted Search

1,477 full-text articles. Page 2 of 73.

Data To Science With Ai And Human-In-The-Loop, Gustavo Perez Sarabia 2024 University of Massachusetts Amherst

Data To Science With Ai And Human-In-The-Loop, Gustavo Perez Sarabia

Doctoral Dissertations

AI has the potential to accelerate scientific discovery by enabling scientists to analyze vast datasets more efficiently than traditional methods. For example, this thesis considers the detection of star clusters in high-resolution images of galaxies taken from space telescopes, as well as studying bird migration from RADAR images. In these applications, the goal is to make measurements to answer scientific questions, such as how the star formation rate is affected by mass, or how the phenology of bird migration is influenced by climate change. However, current computer vision systems are far from perfect for conducting these measurements directly. They may …


Historical Perspectives In Volatility Forecasting Methods With Machine Learning, Zhiang Qiu, Clemens Kownatzki, Fabien Scalzo, Eun Sang Cha 2024 Pepperdine University

Historical Perspectives In Volatility Forecasting Methods With Machine Learning, Zhiang Qiu, Clemens Kownatzki, Fabien Scalzo, Eun Sang Cha

Seaver College Research And Scholarly Achievement Symposium

Volatility forecasting in the financial market plays a pivotal role across a spectrum of disciplines, such as risk management, option pricing, and market making. However, volatility forecasting is challenging because volatility can only be estimated, and different factors influence volatility, ranging from macroeconomic indicators to investor sentiments. While recent works suggest advances in machine learning and artificial intelligence for volatility forecasting, a comprehensive benchmark of current statistical and learning-based methods for such purposes is lacking. Thus, this paper aims to provide a comprehensive survey of the historical evolution of volatility forecasting with a comparative benchmark of key landmark models. We …


Machine Learning Prediction Of Photoluminescence In Mos2: Challenges In Data Acquisition And A Solution Via Improved Crystal Synthesis, Ethan Swonger, John Mann, Jared Horstmann, Daniel Yang 2024 Pepperdine University

Machine Learning Prediction Of Photoluminescence In Mos2: Challenges In Data Acquisition And A Solution Via Improved Crystal Synthesis, Ethan Swonger, John Mann, Jared Horstmann, Daniel Yang

Seaver College Research And Scholarly Achievement Symposium

Transition metal dichalcogenides (TMDCs) like molybdenum disulfide (MoS2) possess unique electronic and optical properties, making them promising materials for nanotechnology. Photoluminescence (PL) is a key indicator of MoS2 crystal quality. This study aimed to develop a machine-learning model capable of predicting the peak PL wavelength of single MoS2 crystals based on micrograph analysis. Our limited ability to consistently synthesize high-quality MoS2 crystals hampered our ability to create a large set of training data. The project focus shifted towards improving MoS2 crystal synthesis to generate improved training data. We implemented a novel approach utilizing low-pressure chemical vapor deposition (LPCVD) combined with …


Deep Learning Can Be Used To Classify And Segment Plant Cell Types In Xylem Tissue, Reem Al Dabagh, Benjamin Shin, Sean Wu, Fabien Scalzo, Helen Holmlund, Jessica Lee, Chris Ghim, Samuel Fitzgerald, Marinna Grijalva 2024 Pepperdine University

Deep Learning Can Be Used To Classify And Segment Plant Cell Types In Xylem Tissue, Reem Al Dabagh, Benjamin Shin, Sean Wu, Fabien Scalzo, Helen Holmlund, Jessica Lee, Chris Ghim, Samuel Fitzgerald, Marinna Grijalva

Seaver College Research And Scholarly Achievement Symposium

Studies of plant anatomical traits are essential for understanding plant physiological adaptations to stressful environments. For example, shrubs in the chaparral ecosystem of southern California have adapted various xylem anatomical traits that help them survive drought and freezing. Previous studies have shown that xylem conduits with a narrow diameter allows certain chaparral shrub species to survive temperatures as low as -12 C. Other studies have shown that increased cell wall thickness of fibers surrounding xylem vessels improves resistance to water stress-induced embolism formation. Historically, these studies on xylem anatomical traits have relied on hand measurements of cells in light micrographs, …


Mechanistic Investigation Of C—C Bond Activation Of Phosphaalkynes With Pt(0) Complexes, Roberto M. Escobar, Abdurrahman C. Ateşin, Christian Müller, William D. Jones, Tülay Ateşin 2024 The University of Texas Rio Grande Valley

Mechanistic Investigation Of C—C Bond Activation Of Phosphaalkynes With Pt(0) Complexes, Roberto M. Escobar, Abdurrahman C. Ateşin, Christian Müller, William D. Jones, Tülay Ateşin

Research Symposium

Carbon–carbon (C–C) bond activation has gained increased attention as a direct method for the synthesis of pharmaceuticals. Due to the thermodynamic stability and kinetic inaccessibility of the C–C bonds, however, activation of C–C bonds by homogeneous transition-metal catalysts under mild homogeneous conditions is still a challenge. Most of the systems in which the activation occurs either have aromatization or relief of ring strain as the primary driving force. The activation of unstrained C–C bonds of phosphaalkynes does not have this advantage. This study employs Density Functional Theory (DFT) calculations to elucidate Pt(0)-mediated C–CP bond activation mechanisms in phosphaalkynes. Investigating the …


Research On Boundary Reconstruction And Government Supervision Strategy For Digital Platform, Jichang DONG, Feiyang ZHAN, Wei LI, Jinlu GUO, Ying LIU 2024 School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China MOE Social Science Laboratory of Digital Economic Forecasts and Policy Simulation, University of Chinese Academy of Sciences, Beijing 100190, China

Research On Boundary Reconstruction And Government Supervision Strategy For Digital Platform, Jichang Dong, Feiyang Zhan, Wei Li, Jinlu Guo, Ying Liu

Bulletin of Chinese Academy of Sciences (Chinese Version)

Digital platform is the most important form of organization in the digital era. How to clarify the boundary between platform autonomy and government regulation so as to exert the order maintenance function of platforms effectively is the key issue in the region of the digital economy governance. This study firstly introduces the basic model of platform autonomy and the regulatory challenges it faces, basing on the background of the emergence of digital platform autonomy. Secondly, through a comparative analysis of the regulatory theories and legal policies of the digital platform autonomy in the European Union and the United States, this …


Research On Chinese Data Sovereignty Policy Based On Lda Model And Policy Instruments, Han QIAO, Junru XU 2024 School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China MOE Social Science Laboratory of Digital Economic Forecasts and Policy Simulation, University of Chinese Academy of Sciences, Beijing 100190, China

Research On Chinese Data Sovereignty Policy Based On Lda Model And Policy Instruments, Han Qiao, Junru Xu

Bulletin of Chinese Academy of Sciences (Chinese Version)

Data sovereignty has become an important component of national sovereignty in the dual context of the digital economy development and the overall national security concept. Major countries and regions are actively carrying out data sovereignty strategic deployment and engaging in fierce competition in data resources, data technology, and data rules. This work adopts the policy text analysis method to study China’s data sovereignty policy, and employs the LDA model and policy instruments to quantitatively analyze the process evolution and thematic characteristics of China’s data sovereignty policy. Drawing on these findings, this study comprehensively considers the global data sovereignty policy and …


A Machine Learning Model Of Perturb-Seq Data For Use In Space Flight Gene Expression Profile Analysis, Liam F. Johnson, James Casaletto, Lauren Sanders, Sylvain Costes 2024 Purdue University

A Machine Learning Model Of Perturb-Seq Data For Use In Space Flight Gene Expression Profile Analysis, Liam F. Johnson, James Casaletto, Lauren Sanders, Sylvain Costes

Graduate Industrial Research Symposium

The genetic perturbations caused by spaceflight on biological systems tend to have a system-wide effect which is often difficult to deconvolute it into individual signals with specific points of origin. Single cell multi-omic data can provide a profile of the perturbational effects, but does not necessarily indicate the initial point of interference within the network. The objective of this project is to take advantage of large scale and genome-wide perturbational datasets by using them to train a tuned machine learning model that is capable of predicting the effects of unseen perturbations in new data. Perturb-Seq datasets are large libraries of …


Characterization Of Biological Particles Using An Integrated Hyperspectral Imaging And Machine Learning, Kaeul Lim, Arezoo Ardekani 2024 Purdue University

Characterization Of Biological Particles Using An Integrated Hyperspectral Imaging And Machine Learning, Kaeul Lim, Arezoo Ardekani

Graduate Industrial Research Symposium

Hyperspectral imaging (HSI) is a promising modality in medicine with many potential applications. This study focuses on developing a label-free lipid nanoparticle characterization method using a convolutional neural network (CNN) analysis of HSI images. The HSI data, hypercube, consists of a series of images acquired at different wavelengths for the same field of view, providing continuous spectra information for each pixel. Three distinct liposome samples were collected for analysis. Advanced image preprocessing and classification methods for HSI data were developed to differentiate liposomes based on their material compositions. Our machine learning-based classification method was able to distinguish different liposome types …


Modelling The "Bottom-Up" Development Pattern Of Tar Spot Disease In Corn, Brenden Lane, Joaquín Guillermo Ramírez-Gil, Carlos Góngora-Canul, Mariela Sofia Fernandez Campos, Andres Cruz-Sancan, Fidel E. Jiménez-Beitia, Alex G. Acosta-Guatemal, Wily Sic, C. D. Cruz 2024 Purdue University

Modelling The "Bottom-Up" Development Pattern Of Tar Spot Disease In Corn, Brenden Lane, Joaquín Guillermo Ramírez-Gil, Carlos Góngora-Canul, Mariela Sofia Fernandez Campos, Andres Cruz-Sancan, Fidel E. Jiménez-Beitia, Alex G. Acosta-Guatemal, Wily Sic, C. D. Cruz

Graduate Industrial Research Symposium

In 2015, the corn-infecting pathogen Phyllachora maydis (causal agent of tar spot disease) was reported for the first time in the United States. The disease has since spread across the US, causing major yield losses. In 2021 alone, 5.88 million metric tons (231.3 million bushels) of US corn yield were lost to this disease, costing an estimated US$1.25 billion. Though fungicides can protect against these agroeconomic losses, application timing can be difficult to optimize because our understanding of tar spot dynamics is still evolving. The current view is that tar spot typically develops bottom-up through a repeating infection cycle. Because …


Geospatial Analysis Of Agricultural Potential In The United States, Diana Febrita 2024 Purdue University

Geospatial Analysis Of Agricultural Potential In The United States, Diana Febrita

Graduate Industrial Research Symposium

Traditionally, the agriculture sector is responsible for providing food and crop products. However, the role of agriculture has expanded beyond its traditional function. It is the main sector that contributes to the provision of food, income, employment, environmental protection, and local economic development. Reflecting on the roles of agriculture, understanding the potential of agriculture in the United States is crucial to discovering the prospects and challenges. This study will briefly discuss the agricultural potential in the United States based on the five assets, including natural capital, financial capital, human capital, physical capital, and social capital. To identify the states with …


Sepsis Treatment: Reinforced Sequential Decision-Making For Saving Lives, Dipesh Tamboli, Jiayu Chen, Kiran Pranesh Jotheeswaran, Denny Yu, Vaneet Aggarwal 2024 Purdue University

Sepsis Treatment: Reinforced Sequential Decision-Making For Saving Lives, Dipesh Tamboli, Jiayu Chen, Kiran Pranesh Jotheeswaran, Denny Yu, Vaneet Aggarwal

Graduate Industrial Research Symposium

Sepsis, a life-threatening condition triggered by the body's exaggerated response to infection, demands urgent intervention to prevent severe complications. Existing machine learning methods for managing sepsis struggle in offline scenarios, exhibiting suboptimal performance with survival rates below 50%. Our project introduces the "PosNegDM: Reinforcement Learning with Positive and Negative Demonstrations for Sequential Decision-Making" framework utilizing an innovative transformer-based model and a feedback reinforcer to replicate expert actions while considering individual patient characteristics. A mortality classifier with 96.7% accuracy guides treatment decisions towards positive outcomes. The PosNegDM framework significantly improves patient survival, saving 97.39% of patients and outperforming established machine learning …


Accuracy Of Nitrate Hysteresis And Flushing For Agricultural Watersheds In The Midwest, Noah Rudko, Sara K. W. McMillian, Jane Frankenberger, François Birgand 2024 Purdue University

Accuracy Of Nitrate Hysteresis And Flushing For Agricultural Watersheds In The Midwest, Noah Rudko, Sara K. W. Mcmillian, Jane Frankenberger, François Birgand

Graduate Industrial Research Symposium

Storm event-based metrics, such as hysteresis (HI) and flushing (FI), are used to differentiate nitrate pathways and sources, which is essential for watershed management. Estimations of these event-based metrics typically use high frequency (15-minute – hourly) measurements, but daily data are also used due to their greater availability. To date, there has been no study assessing how using lower frequency samples affect the accuracy of HI and FI, which could skew interpretation of potential nutrient pathways and sources. We used continuous measurements of nitrate collected at 9 watersheds throughout the Midwest spanning 448 storms. HI and FI were estimated from …


Online Class-Incremental Learning For Real-World Food Image Classification, Siddeshwar Raghavan, Jiangpeng He, Fengqing Zhu 2024 Purdue University

Online Class-Incremental Learning For Real-World Food Image Classification, Siddeshwar Raghavan, Jiangpeng He, Fengqing Zhu

Graduate Industrial Research Symposium

Food image classification is essential for monitoring health and tracking dietary in image-based dietary assessment methods. However, conventional systems often rely on static datasets with fixed classes and uniform distribution. In contrast, real-world food consumption patterns, shaped by cultural, economic, and personal influences, involve dynamic and evolving data. Thus, it requires the classification system to cope with continuously evolving data. Online Class Incremental Learning (OCIL) addresses the challenge of learning continuously from a single-pass data stream while adapting to the new knowledge and reducing catastrophic forgetting. Experience Replay (ER) based OCIL methods store a small portion of previous data and …


Resource Optimization For Air Mobility Under Emergency Situations, Yongxin (Jack) Liu 2024 Embry-Riddle Aeronautical University

Resource Optimization For Air Mobility Under Emergency Situations, Yongxin (Jack) Liu

Math Department Colloquium Series

This project aims to improve air traffic management in emergencies. We first developed a GRU neural network to forecast weather-related airport capacity constraints using historical data, underscoring the value of real-time data analysis. We then optimized emergency evacuation air travel using Particle Swarm Optimization, demonstrating the ability to quickly aggregate evacuation flight resources cost-effectively. Finally, we provided a hybrid model combining a genetic algorithm with a neural network for evacuation planning, we show that neural network can be integrated accelerate genetic algorithms for efficient and performance assured system optimization.


Assessing Gait Metrics For Early Parkinson's Disease Prediction: A Preliminary Analysis Of Underfit Models, Daniel Salinas, Gerardo Medellin, Katherine Bolado, Tomas Gomez, Kelsey Potter-Baker, Nawaz Khan Abdul Hack, Ramu Vadukapuram 2024 The University of Texas Rio Grande Valley

Assessing Gait Metrics For Early Parkinson's Disease Prediction: A Preliminary Analysis Of Underfit Models, Daniel Salinas, Gerardo Medellin, Katherine Bolado, Tomas Gomez, Kelsey Potter-Baker, Nawaz Khan Abdul Hack, Ramu Vadukapuram

Research Symposium

Background: Parkinson's Disease (PD) is characterized by both motor and non-motor symptoms, and its diagnosis primarily relies on clinical presentation. There is a growing need for diagnostic tools to identify the early signs of PD, particularly the initial motor impairments often manifested as gait abnormalities. Here we seek to present preliminary findings to address this need. Our study focuses on using Machine Learning techniques (ML) to predict the PD clinical stage most efficiently and accurately. Specifically, we have sought to evaluate how spatiotemporal characteristics and other locomotor performance variables obtained on a walkway system can be utilized to identify the …


Transfer Learning In The Era Of Foundational Models: Application To Diagnosis In Rheumatology, Prashant Shekhar 2024 Embry-Riddle Aeronautical University

Transfer Learning In The Era Of Foundational Models: Application To Diagnosis In Rheumatology, Prashant Shekhar

Math Department Colloquium Series

Problems with current synovitis grading procedures

  • There has been a lack of reliability in grading these images in the medical community due to a lack of universally accepted diagnostic criteria [Momtazmanesh et al., 2022]
  • The human/machine variability creates an additional challenge in an efficient automated scoring system [Ranganath et al., 2022]
  • There is a lack of consistency between doctors in grading these images [Momtazmanesh et al., 2022]


Session 8: Machine Learning Based Behavior Of Non-Opec Global Supply In Crude Oil Price Determinism, Mofe Jeje 2024 North Dakota State University

Session 8: Machine Learning Based Behavior Of Non-Opec Global Supply In Crude Oil Price Determinism, Mofe Jeje

SDSU Data Science Symposium

Abstract

While studies on global oil price variability, occasioned by OPEC crude oil supply, is well documented in energy literature; the impact assessment of non-OPEC global oil supply on price variability, on the other hand, has not received commensurate attention. Given this gap, the primary objective of this study, therefore, is to estimate the magnitude of oil price determinism that is explained by the share of non-OPEC’s global crude oil supply. Using secondary sources of data collection method, data for target variable will be collected from the US Federal Reserve, as it relates to annual crude oil price variability, while …


Predicting Crop Yield Using Remote Sensing Data, Mary Row, Jung-Han Kimn, Hossein Moradi 2024 Saint Mary's University of Minnesota

Predicting Crop Yield Using Remote Sensing Data, Mary Row, Jung-Han Kimn, Hossein Moradi

SDSU Data Science Symposium

Accurate crop yield predictions can help farmers make adjustments or changes in their farming practices to optimize their harvest. Remote sensing data is an inexpensive approach to collecting massive amounts of data that could be utilized for predicting crop yield. This study employed linear regression and spatial linear models were used to predict soybean yield with data from Landsat 8 OLI. Each model was built using only spectral bands of the satellite, only vegetation indices, and both spectral bands and vegetation indices. All analysis was based on data collected from two fields in South Dakota from the 2019 and 2021 …


Principal Component Analysis With Application To Credit Card Data, Eleanor Cain, Semhar Michael, Gary Hatfield 2024 South Dakota State University

Principal Component Analysis With Application To Credit Card Data, Eleanor Cain, Semhar Michael, Gary Hatfield

SDSU Data Science Symposium

Principal Component Analysis (PCA) is a type of dimension reduction technique used in data analysis to process the data before making a model. In general, dimension reduction allows analysts to make conclusions about large data sets by reducing the number of variables while retaining as much information as possible. Using the numerical variables from a data set, PCA aims to compute a smaller set of uncorrelated variables, called principal components, that account for a majority of the variability from the data. The purpose of this poster is to understand PCA as well as perform PCA on a large sample credit …


Digital Commons powered by bepress