Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

1,348 Full-Text Articles 2,865 Authors 273,342 Downloads 182 Institutions

All Articles in Data Science

Faceted Search

1,348 full-text articles. Page 1 of 67.

The Psychological Science Accelerator's Covid-19 Rapid-Response Dataset, Erin M. BUCHANAN, Andree HARTANTO 2023 Harrisburg University of Science and Technology

The Psychological Science Accelerator's Covid-19 Rapid-Response Dataset, Erin M. Buchanan, Andree Hartanto

Research Collection School of Social Sciences

In response to the COVID-19 pandemic, the Psychological Science Accelerator coordinated three large-scale psychological studies to examine the effects of loss-gain framing, cognitive reappraisals, and autonomy framing manipulations on behavioral intentions and affective measures. The data collected (April to October 2020) included specific measures for each experimental study, a general questionnaire examining health prevention behaviors and COVID-19 experience, geographical and cultural context characterization, and demographic information for each participant. Each participant started the study with the same general questions and then was randomized to complete either one longer experiment or two shorter experiments. Data were provided by 73,223 participants with …


Machine Learning Prediction Of Hea Properties, Nicholas J. Beaver, Nathaniel Melisso, Travis Murphy 2023 California Polytechnic State University, San Luis Obispo

Machine Learning Prediction Of Hea Properties, Nicholas J. Beaver, Nathaniel Melisso, Travis Murphy

College of Engineering Summer Undergraduate Research Program

High-entropy alloys (HEA) are a very new development in the field of metallurgical materials. They are made up of multiple principle atoms unlike traditional alloys, which contributes to their high configurational entropy. The microstructure and properties of HEAs are are not well predicted with the models developed for more common engineering alloys, and there is not enough data available on HEAs to fully represent the complex behavior of these alloys. To that end, we explore how the use of machine learning models can be used to model the complex, high dimensional behavior in the HEA composition space. Based on our …


Precise Method To Identify Kinase Drug Targets In Complex Diseases: The First Step Towards Sustainable And Effective Treatment, Hasbanny Irisson, Marzieh Ayati 2023 The University of Texas Rio Grande Valley

Precise Method To Identify Kinase Drug Targets In Complex Diseases: The First Step Towards Sustainable And Effective Treatment, Hasbanny Irisson, Marzieh Ayati

Research Symposium

Background: Kinases are enzymes that have proven to be important drug targets due to their role in critical biological mechanisms such as phosphorylation. Phosphorylation happens when a kinase catalyzes the transfer of a phosphate group to a protein in a phosphorylated site, which then becomes known as the substrate of the kinase. Any dysregulation of protein phosphorylation causes a wide range of complex diseases including cancer. Thus, discovering the links between kinases and their substrates (i.e. predicting kinase-substrate associations (KSAs)) is crucial in developing effective and sustainable treatments. Presently, less than 5% of phosphorylated sites have an associated kinase, and …


Data Ethics And Privacy For Researchers, Kelley F. Rowan 2023 Florida International University

Data Ethics And Privacy For Researchers, Kelley F. Rowan

Works of the FIU Libraries

This workshop addresses specific data privacy and anonymization standards and techniques for researchers that are collecting personally identifiable information as well as sensitive information. The workshop covers federal, state, and international laws and regulations governing data privacy, the development of an impact assessment and privacy policy. The second half of the workshop focuses on ethical workflows, anonymization techniques and related resources.


Organizing Pmode Dopplergrams Of Jupiter With Matlab, Brady T. Smith, Deborah Gulledge, Cody Shaw, Gerard Williger 2023 University of Louisville

Organizing Pmode Dopplergrams Of Jupiter With Matlab, Brady T. Smith, Deborah Gulledge, Cody Shaw, Gerard Williger

The Cardinal Edge

The interiors of the giant planets are poorly known. At the time of writing, such investigations have been limited to measuring gravitational effects from a handful of orbital probes. The most recent attempt to map the interior is via PMODE (the Planetary Multilevel Oscillations and Dynamics Experiment), designed to explore Jupiter’s core by collecting Dopplergrams. Small radial velocity shifts in Jupiter’s upper cloud decks enable us to map its atmospheric dynamics and consequently its interior via Dioseismology (techniques similar to Helioseismology, applied to Jupiter). This campaign produced a vast dataset with more than 50,000 exposures, every 30 seconds, over 24 …


Effects Of Weight Initialization Methods On Ffn's, Ida K. Karem 2023 Jefferson Community and Technical College

Effects Of Weight Initialization Methods On Ffn's, Ida K. Karem

The Cardinal Edge

Weight initialization is the method of determining starting values of weights in a neural network. The way this method is done can have massive effects on the network[2, 3, 6, 9] and can halt training if not handled properly. On the other hand, if initialization is chosen tactfully it can improve training and accuracy greatly. The initialization method usually called Normalized Xavier will be referred to as Nox in this paper to avoid confusion with the Xavier initialization method. This study analyzes five methods of weight initialization(Nox, He, Xavier, Plutonian, and Self-Root), two of them …


Sentiment Analysis Of Public Perception Towards Elon Musk On Reddit (2008-2022), Daniel Maya Bonilla, Samuel Iradukunda, Pamela Thomas 2023 University of Louisville

Sentiment Analysis Of Public Perception Towards Elon Musk On Reddit (2008-2022), Daniel Maya Bonilla, Samuel Iradukunda, Pamela Thomas

The Cardinal Edge

As Elon Musk’s influence in technology and business continues to expand, it becomes crucial to comprehend public sentiment surrounding him in order to gauge the impact of his actions and statements. In this study, we conducted a comprehensive analysis of comments from various subreddits discussing Elon Musk over a 14-year period, from 2008 to 2022. Utilizing advanced sentiment analysis models and natural language processing techniques, we examined patterns and shifts in public sentiment towards Musk, identifying correlations with key events in his life and career. Our findings reveal that public sentiment is shaped by a multitude of factors, including his …


Machine Learning And Causality For Interpretable And Automated Decision Making, Maria Lentini 2023 Rowan University

Machine Learning And Causality For Interpretable And Automated Decision Making, Maria Lentini

Theses and Dissertations

This abstract explores two key areas in decision science: automated and interpretable decision making. In the first part, we address challenges related to sparse user interaction data and high item turnover rates in recommender systems. We introduce a novel algorithm called Multi-View Interactive Collaborative Filtering (MV-ICTR) that integrates user-item ratings and contextual information, improving performance, particularly for cold-start scenarios. In the second part, we focus on Student Prescription Trees (SPTs), which are interpretable decision trees. These trees use a black box "teacher" model to predict counterfactuals based on observed covariates. We experiment with a Bayesian hierarchical binomial regression model as …


Reu-Deim Classification Of Hispanic Voters In Hispanic Groups Using Name And Zip Code Data In Palm Beach, Florida, Kamila Soto-Ortiz 2023 Embry-Riddle Aeronautical University

Reu-Deim Classification Of Hispanic Voters In Hispanic Groups Using Name And Zip Code Data In Palm Beach, Florida, Kamila Soto-Ortiz

Beyond: Undergraduate Research Journal

When it comes to registering to vote, Hispanic voters can only register as “Hispanic” in the “Race/Ethnicity” category, causing difficulties when analyzing voting trends amongst the Hispanic community. Upon the recent idea that not all Hispanic Groups vote the same, the goal is to create a model that can possibly identify a voter’s Hispanic Group with the information provided on the public Florida voter file. This is accomplished using name and zip code data for all voters in Palm Beach, Florida. This paper will explore the model implemented, its findings and limitations. Palm Beach, Florida, is met with low confidence …


A Low-Complexity Algorithm To Determine Spacecraft Trajectories, Sirani Perera 2023 Embry-Riddle Aeronautical University

A Low-Complexity Algorithm To Determine Spacecraft Trajectories, Sirani Perera

Math Department Colloquium Series

The growing traffic within the Cislunar region has created a need for computationally effective methods to obtain the trajectories of spacecraft in the Cislunar region. By developing algorithms with low time and arithmetic complexities, we can effectively address these needs.

In this talk, we will present a mathematical model that uses interpolation and boundary conditions to obtain trajectories for satellites based on the principles of three-body dynamics. Following the model, we propose a low- complexity algorithm to generate satellite trajectories. Once the algorithm is proposed, we will apply it to the relevant periodic orbits in the Cislunar region. Finally, we …


Produção De Artigos Científicos No Estudo Longitudinal De Saúde Do Adulto (Elsa-Brasil), 2011-2023, Arthur Sandi Bauermann, Maria Antônia Mylius de Oliveira, Clara Akemi Basso Aseka, Luiza Dalmolin Beneduzi 2023 Universidade Federal do Rio Grande do Sul - Brasil

Produção De Artigos Científicos No Estudo Longitudinal De Saúde Do Adulto (Elsa-Brasil), 2011-2023, Arthur Sandi Bauermann, Maria Antônia Mylius De Oliveira, Clara Akemi Basso Aseka, Luiza Dalmolin Beneduzi

AMNET XX Conferencia Internacional

No abstract provided.


Uncertainties In Retrieval Of Remote Sensing Reflectance From Ocean Color Satellite Observations, Eder I. Herrera Estrella 2023 The Graduate Center, City University of New York

Uncertainties In Retrieval Of Remote Sensing Reflectance From Ocean Color Satellite Observations, Eder I. Herrera Estrella

Dissertations, Theses, and Capstone Projects

Ocean Color radiometry uses remote sensing to interpret ocean dynamics by retrieving remote sensing reflectance (𝑅𝑟𝑠) from satellite imagery at different scales and over different time periods. 𝑅𝑟𝑠 spectrum characterizes the ocean color that we observe, and from which we can discern concentrations of chlorophyll, organic and inorganic particles, and carbon fluxes in the ocean and atmosphere. 𝑅𝑟𝑠 is derived from the total radiance at the top of the atmosphere (TOA). However, it only represents up to ten percent of the total signal. Hence, the retrieval of 𝑅𝑟𝑠 from the total radiance at TOA involves the application of atmospheric correction …


Syllabus For Computational Physics (Phys 39907), Mark D. Shattuck 2023 CUNY City College

Syllabus For Computational Physics (Phys 39907), Mark D. Shattuck

Open Educational Resources

Syllabus for City College of New York Computational Physics course.


A Neural-Network-Based Landscape Search Engine: Lse Wisconsin, Matthew Haffner, Matthew DeWitte, Papia F. Rozario, Gustavo A. Ovando-Montejo 2023 University of Wisconsin–Eau Claire

A Neural-Network-Based Landscape Search Engine: Lse Wisconsin, Matthew Haffner, Matthew Dewitte, Papia F. Rozario, Gustavo A. Ovando-Montejo

Environment and Society Faculty Publications

The task of image retrieval is common in the world of data science and deep learning, but it has received less attention in the field of remote sensing. The authors seek to fill this gap in research through the presentation of a web-based landscape search engine for the US state of Wisconsin. The application allows users to select a location on the map and to find similar locations based on terrain and vegetation characteristics. It utilizes three neural network models—VGG16, ResNet-50, and NasNet—on digital elevation model data, and uses the NDVI mean and standard deviation for comparing vegetation data. The …


Static Malware Family Clustering Via Structural And Functional Characteristics, David George, Andre Mauldin, Josh Mitchell, Sufiyan Mohammed, Robert Slater 2023 Southern Methodist University

Static Malware Family Clustering Via Structural And Functional Characteristics, David George, Andre Mauldin, Josh Mitchell, Sufiyan Mohammed, Robert Slater

SMU Data Science Review

Static and dynamic analyses are the two primary approaches to analyzing malicious applications. The primary distinction between the two is that the application is analyzed without execution in static analysis, whereas the dynamic approach executes the malware and records the behavior exhibited during execution. Although each approach has advantages and disadvantages, dynamic analysis has been more widely accepted and utilized by the research community whereas static analysis has not seen the same attention. This study aims to apply advancements in static analysis techniques to demonstrate the identification of fine-grained functionality, and show, through clustering, how malicious applications may be grouped …


Using Geographic Information To Explore Player-Specific Movement And Its Effects On Play Success In The Nfl, Hayley Horn, Eric Laigaie, Alexander Lopez, Shravan Reddy 2023 Southern Methodist University

Using Geographic Information To Explore Player-Specific Movement And Its Effects On Play Success In The Nfl, Hayley Horn, Eric Laigaie, Alexander Lopez, Shravan Reddy

SMU Data Science Review

American Football is a billion-dollar industry in the United States. The analytical aspect of the sport is an ever-growing domain, with open-source competitions like the NFL Big Data Bowl accelerating this growth. With the amount of player movement during each play, tracking data can prove valuable in many areas of football analytics. While concussion detection, catch recognition, and completion percentage prediction are all existing use cases for this data, player-specific movement attributes, such as speed and agility, may be helpful in predicting play success. This research calculates player-specific speed and agility attributes from tracking data and supplements them with descriptive …


A Hybrid Ensemble Of Learning Models, Bivin Sadler, Dhruba Dey, Duy Nguyen, Tavin Weeda 2023 Southern Methodist University

A Hybrid Ensemble Of Learning Models, Bivin Sadler, Dhruba Dey, Duy Nguyen, Tavin Weeda

SMU Data Science Review

Statistical models in time series forecasting have long been challenged to be superseded by the advent of deep learning models. This research proposes a new hybrid ensemble of forecasting models that combines the strengths of several strong candidates from these two model types. The proposed ensemble aims to improve the accuracy of forecasts and reduce computational complexity by leveraging the strengths of each candidate model.


Forecasting Covid-19 With Temporal Hierarchies And Ensemble Methods, Li Shandross 2023 University of Massachusetts Amherst

Forecasting Covid-19 With Temporal Hierarchies And Ensemble Methods, Li Shandross

Masters Theses

Infectious disease forecasting efforts underwent rapid growth during the COVID-19 pandemic, providing guidance for pandemic response and about potential future trends. Yet despite their importance, short-term forecasting models often struggled to produce accurate real-time predictions of this complex and rapidly changing system. This gap in accuracy persisted into the pandemic and warrants the exploration and testing of new methods to glean fresh insights.

In this work, we examined the application of the temporal hierarchical forecasting (THieF) methodology to probabilistic forecasts of COVID-19 incident hospital admissions in the United States. THieF is an innovative forecasting technique that aggregates time-series data into …


Graph Representation Learning With Box Embeddings, Dongxu Zhang 2023 University of Massachusetts Amherst

Graph Representation Learning With Box Embeddings, Dongxu Zhang

Doctoral Dissertations

Graphs are ubiquitous data structures, present in many machine-learning tasks, such as link prediction of products and node classification of scientific papers. As gradient descent drives the training of most modern machine learning architectures, the ability to encode graph-structured data using a differentiable representation is essential to make use of this data. Most approaches encode graph structure in Euclidean space, however, it is non-trivial to model directed edges. The naive solution is to represent each node using a separate "source" and "target" vector, however, this can decouple the representation, making it harder for the model to capture information within longer …


Crop Monitoring And Nutrient Prediction Using Satellite Imagery And Soil Data, Olatunde D. Akanbi, Brian Gonzalez Hernandez, Erika I. Barcelos, Arafath Nihar, Laura S. Bruckman, Yinghui Wu, Jeffrey Yarus, Roger H. French 2023 Case Western Reserve University

Crop Monitoring And Nutrient Prediction Using Satellite Imagery And Soil Data, Olatunde D. Akanbi, Brian Gonzalez Hernandez, Erika I. Barcelos, Arafath Nihar, Laura S. Bruckman, Yinghui Wu, Jeffrey Yarus, Roger H. French

Student Scholarship

No abstract provided.


Digital Commons powered by bepress