Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

1,271 Full-Text Articles 2,716 Authors 273,342 Downloads 179 Institutions

All Articles in Data Science

Faceted Search

1,271 full-text articles. Page 1 of 63.

The Influence Of Allostery Governing The Changes In Protein Dynamics Upon Substitution, Joseph Hess 2023 Clemson University

The Influence Of Allostery Governing The Changes In Protein Dynamics Upon Substitution, Joseph Hess

All Dissertations

The focus of this research is to investigate the effects of allostery on the function/activity of an enzyme, human immunodeficiency virus type 1 (HIV-1) protease, using well-defined statistical analyses of the dynamic changes of the protein and variants with unique single point substitutions 1. The experimental data1 evaluated here only characterized HIV-1 protease with one of its potential target substrates. Probing the dynamic interactions of the residues of an enzyme and its variants can offer insight of the developmental importance for allosteric signaling and their connection to a protein’s function. The realignment of the secondary structure elements can …


Computational Analysis Of Antibody Binding Mechanisms To The Omicron Rbd Of Sars-Cov-2 Spike Protein: Identification Of Epitopes And Hotspots For Developing Effective Therapeutic Strategies, Mohammed Alshahrani 2023 Chapman University

Computational Analysis Of Antibody Binding Mechanisms To The Omicron Rbd Of Sars-Cov-2 Spike Protein: Identification Of Epitopes And Hotspots For Developing Effective Therapeutic Strategies, Mohammed Alshahrani

Computational and Data Sciences (PhD) Dissertations

The advent of the Omicron strain of SARS-CoV-2 has elicited apprehension regarding its potential influence on the effectiveness of current vaccines and antibody treatments. The present investigation involved the implementation of mutational scanning analyses to examine the impact of Omicron mutations on the binding affinity of four categories of antibodies that target the Omicron receptor binding domain (RBD) of the Spike protein. The study demonstrates that the Omicron variant harbors 23 unique mutations across the RBD regions I, II, III, and IV. Of these mutations, seven are shared between RBD regions I and II, while three are shared among RBD …


Future Trends And Directions For Secure Infrastructure Architecture In The Education Sector: A Systematic Review Of Recent Evidence, Isaac Atta Senior Ampofo, Isaac Atta Junior Ampofo 2023 Kwame Nkrumah University of Science and Technology

Future Trends And Directions For Secure Infrastructure Architecture In The Education Sector: A Systematic Review Of Recent Evidence, Isaac Atta Senior Ampofo, Isaac Atta Junior Ampofo

Journal of Research Initiatives

The most efficient approach to giving large numbers of students’ access to computational resources is through a data center. A contemporary method for building the data center's computer infrastructure is the software-defined model, which enables user tasks to be processed in a reasonable amount of time and at a reasonable cost. The researcher examines potential directions and trends for a secured infrastructure design in this article. Additionally, interoperable, highly reusable modules that can include the newest trends in the education industry are made possible by cloud-based educational software. The Reference Architecture for University Education System Using AWS Services is presented …


Instagram Travel Influencers Coping With Covid-19 Travel Disruption, Andrei Kirilenko, Katarzyna Emin, Karen Tavares 2023 University of Florida

Instagram Travel Influencers Coping With Covid-19 Travel Disruption, Andrei Kirilenko, Katarzyna Emin, Karen Tavares

ITSA 2022 Gran Canaria - 9th Biennial Conference: Corporate Entrepreneurship and Global Tourism Strategies After Covid 19

A significant portion of today’s marketing is done through social media influencers, that is, through bloggers with established online credibility in a certain area who are recognized and followed by a sizable online audience. In the travel and hospitality industry, the influencer marketing is primarily done through Instagram due to its emphasis on visual images rather than texts. Covid-19 related travel restrictions and shrinking social media advertisement in travel industry have heavily impacted travel influencers, reducing their income and forcing many out of business. We present the outcomes of a study of the top 150 online travel influencers. The analysis …


Performance Analysis Of Deep-Learning Based Open Set Recognition Algorithms For Network Intrusion Detection Systems, Gaspard Baye, Priscila Silva, Alexandre Broggi, Lance Fiondella, Nathaniel D. Bastian, Gokhan Kul 2023 Army Cyber Institute, U.S. Military Academy

Performance Analysis Of Deep-Learning Based Open Set Recognition Algorithms For Network Intrusion Detection Systems, Gaspard Baye, Priscila Silva, Alexandre Broggi, Lance Fiondella, Nathaniel D. Bastian, Gokhan Kul

ACI Journal Articles

Open Set Recognition (OSR) is the ability of a machine learning (ML) algorithm to classify the known and recognize the unknown. In other words, OSR enables novelty detection in classification algorithms. This broader approach is critical to detect new types of attacks, including zero-days, thereby improving the effectiveness and efficiency of various ML-enabled mission-critical systems, such as cyber-physical, facial recognition, spam filtering, and cyber defense systems such as intrusion detection systems (IDS). In ML algorithms, like deep learning (DL) classifiers, hyperparameters control the learning process; their values affect other model parameters, such as weights and biases, which affect the performance …


Cyber Creative Generative Adversarial Network For Novel Malicious Packets, John Pavlik, Nathaniel D. Bastian 2023 Army Cyber Institute, U.S. Military Academy

Cyber Creative Generative Adversarial Network For Novel Malicious Packets, John Pavlik, Nathaniel D. Bastian

ACI Journal Articles

Machine learning (ML) requires both quantity and variety of examples in order to learn generalizable patterns. In cybersecurity, labeling network packets is a tedious and difficult task. This leads to insufficient labeled datasets of network packets for training ML-based Network Intrusion Detection Systems (NIDS) to detect malicious intrusions. Furthermore, benign network traffic and malicious cyber attacks are always evolving and changing, meaning that the existing datasets quickly become obsolete. We investigate generative ML modeling for network packet synthetic data generation/augmentation to improve NIDS detection of novel, but similar, cyber attacks by generating well-labeled synthetic network traffic. We develop a Cyber …


Autonomous Cyber Warfare Agents: Dynamic Reinforcement Learning For Defensive Cyber Operations, David A. Bierbrauer, Rob Schabinger, Caleb Carlin, Jonathan Mullin, John Pavlik, Nathaniel D. Bastian 2023 Army Cyber Institute, United States Military Academy

Autonomous Cyber Warfare Agents: Dynamic Reinforcement Learning For Defensive Cyber Operations, David A. Bierbrauer, Rob Schabinger, Caleb Carlin, Jonathan Mullin, John Pavlik, Nathaniel D. Bastian

ACI Journal Articles

In this work, we aim to develop novel cybersecurity playbooks by exploiting dynamic reinforcement learning (RL) methods to close holes in the attack surface left open by the traditional signature-based approach to Defensive Cyber Operations (DCO). A useful first proof-of-concept is provided by the problem of training a scanning defense agent using RL; as a first line of defense, it is important to protect sensitive networks from network mapping tools. To address this challenge, we developed a hierarchical, Monte Carlo-based RL framework for the training of an autonomous agent which detects and reports the presence of Nmap scans in near …


Data-Efficient, Federated Learning For Raw Network Traffic Detection, Mikal Willeke, David A. Bierbrauer, Nathaniel D. Bastian 2023 Army Cyber Institute, United States Military Academy

Data-Efficient, Federated Learning For Raw Network Traffic Detection, Mikal Willeke, David A. Bierbrauer, Nathaniel D. Bastian

ACI Journal Articles

Traditional machine learning (ML) models used for enterprise network intrusion detection systems (NIDS) typically rely on vast amounts of centralized data with expertly engineered features. Previous work, however, has shown the feasibility of using deep learning (DL) to detect malicious activity on raw network traffic payloads rather than engineered features at the edge, which is necessary for tactical military environments. In the future Internet of Battlefield Things (IoBT), the military will find itself in multiple environments with disconnected networks spread across the battlefield. These resource-constrained, data-limited networks require distributed and collaborative ML/DL models for inference that are continually trained both …


Graph Representation Learning For Context-Aware Network Intrusion Detection, Augustine Premkumar, Madeline Schneider, Carlton Spivey, John Pavlik, Nathaniel D. Bastian 2023 Army Cyber Institute, U.S. Military Academy

Graph Representation Learning For Context-Aware Network Intrusion Detection, Augustine Premkumar, Madeline Schneider, Carlton Spivey, John Pavlik, Nathaniel D. Bastian

ACI Journal Articles

Detecting malicious activity using a network intrusion detection system (NIDS) is an ongoing battle for the cyber defender. Increasingly, cyber-attacks are sophisticated and occur rapidly, necessitating the use of machine/deep learning (ML/DL) techniques for network intrusion detection. Traditional ML/DL techniques for NIDS classifiers, however, are often unable to sufficiently find context-driven similarities between the various network flows and/or packet captures. In this work, we leverage graph representation learning (GRL) techniques to successfully detect adversarial intrusions by exploiting the graph structure of NIDS data to derive context awareness, as graphs are a universal language for describing entities and their relationships. We …


Say That Again: The Role Of Multimodal Redundancy In Communication And Context, Brandon Javier Dormes 2023 Dartmouth College

Say That Again: The Role Of Multimodal Redundancy In Communication And Context, Brandon Javier Dormes

Cognitive Science Senior Theses

With several modes of expression, such as facial expressions, body language, and speech working together to convey meaning, social communication is rich in redundancy. While typically relegated to signal preservation, this study investigates the role of cross-modal redundancies in establishing performance context, focusing on unaided, solo performances. Drawing on information theory, I operationalize redundancy as predictability and use an array of machine learning models to featurize speakers' facial expressions, body poses, movement speeds, acoustic features, and spoken language from 24 TEDTalks and 16 episodes of Comedy Central Stand-Up Presents. This analysis demonstrates that it is possible to distinguish between these …


Utilizing Few-Shot Meta Learning Algorithms For Medical Image Segmentation, Nick Littlefield 2023 University of Southern Maine

Utilizing Few-Shot Meta Learning Algorithms For Medical Image Segmentation, Nick Littlefield

Thinking Matters Symposium

Deep learning models can be difficult to train because they require large amounts of data, which we usually do not have or are too expensive to get or annotate. To overcome this problem, we can use few-shot meta-learning, which allows us to train deep learning models with little data. Using a few examples, meta-learning, or learning-to-learn, aims to use the experience learned during training to generalize to unknown tasks. Medical imaging is an industry where it is particularly useful, as there is limited publicly available data due to patient privacy concerns and annotating costs.

This project examines how meta-learning performs …


Towards An Experimental Bibliography Of Hemispheric Reconstruction Newspapers, Joshua Ortiz Baco, Benjamin Charles Germain Lee, Jim Casey, Sarah H. Salter 2023 University of Tennesse, Knoxville

Towards An Experimental Bibliography Of Hemispheric Reconstruction Newspapers, Joshua Ortiz Baco, Benjamin Charles Germain Lee, Jim Casey, Sarah H. Salter

Criticism

Digital collections of newspapers have drawn broader attention to the fragmented and scattered print histories of minoritized communities. Attempts to survey these histories through bibliography, however, quickly meet with a fundamental problem: the practice of bibliographic description calls for creating a static record of social affiliations. Given the overwhelming scholarly consensus that categories such as race, ethnicity, and language are socially constructed, this article introduces an experimental bibliographic method for mapping the vast landscape of historical newspapers. This method extends the machine learning affordances of a recent project called Newspaper Navigator to enumerate the newspapers in Chronicling America according to …


The Sociolinguistics Of Code-Switching In Hong Kong’S Digital Landscape: A Mixed-Methods Exploration Of Cantonese-English Alternation Patterns On Whatsapp, Wilkinson Daniel Wong Gonzales, Yuen Man Tsang 2023 The Chinese University of Hong Kong

The Sociolinguistics Of Code-Switching In Hong Kong’S Digital Landscape: A Mixed-Methods Exploration Of Cantonese-English Alternation Patterns On Whatsapp, Wilkinson Daniel Wong Gonzales, Yuen Man Tsang

Journal of English and Applied Linguistics

This paper examines the prevalence of Cantonese-English code-mixing in Hong Kong through an under-researched digital medium. Prior research on this code-alternation practice has often been limited to exploring either the social or linguistic constraints of code-switching in spoken or written communication. Our study takes a holistic approach to analyzing code-switching in a hybrid medium that exhibits features of both spoken and written discourse. We specifically analyze the code-switching patterns of 24 undergraduates from a Hong Kong university on WhatsApp and examine how both social and linguistic factors potentially constrain these patterns. Utilizing a self-compiled sociolinguistic corpus as well as survey …


Population Modeling With Machine Learning Can Enhance Measures Of Mental Health - Open-Data Replication, Ty Easley, Ruiqi Chen, Kayla Hannon, Rosie Dutt, Janine Bijsterbosch 2023 Washington University School of Medicine in St. Louis

Population Modeling With Machine Learning Can Enhance Measures Of Mental Health - Open-Data Replication, Ty Easley, Ruiqi Chen, Kayla Hannon, Rosie Dutt, Janine Bijsterbosch

Statistical and Data Sciences: Faculty Publications

Efforts to predict trait phenotypes based on functional MRI data from large cohorts have been hampered by low prediction accuracy and/or small effect sizes. Although these findings are highly replicable, the small effect sizes are somewhat surprising given the presumed brain basis of phenotypic traits such as neuroticism and fluid intelligence. We aim to replicate previous work and additionally test multiple data manipulations that may improve prediction accuracy by addressing data pollution challenges. Specifically, we added additional fMRI features, averaged the target phenotype across multiple measurements to obtain more accurate estimates of the underlying trait, balanced the target phenotype's distribution …


Phantom Shootings, Allan Ambris 2023 The Graduate Center, City University of New York

Phantom Shootings, Allan Ambris

Dissertations, Theses, and Capstone Projects

This capstone is a website designed to critique NYC Open Data reporting with respect to shootings through a series of visualizations and discoveries. The NYPD Shooting Incidents datasets (Historic and Year to Date) introduce themselves to the user by claiming to be a “list of every shooting incident that occurred in NYC.” The supplied documentation reveals that this is not the case.

After understanding the supporting materials, there are still undisclosed truths. My exploration of the data revealed that a single victim may be represented across multiple entries. Additionally, multiple victims may be represented by a single entry. It is …


Mosaic: Spatially-Multiplexed Edge Ai Optimization Over Multiple Concurrent Video Sensing Streams, Ila GOKARN, Hemanth SABELLA, Yigong HU, Tarek ABDELZAHER, Archan MISRA 2023 Singapore Management University

Mosaic: Spatially-Multiplexed Edge Ai Optimization Over Multiple Concurrent Video Sensing Streams, Ila Gokarn, Hemanth Sabella, Yigong Hu, Tarek Abdelzaher, Archan Misra

Research Collection School Of Computing and Information Systems

Sustaining high fidelity and high throughput of perception tasks over vision sensor streams on edge devices remains a formidable challenge, especially given the continuing increase in image sizes (e.g., generated by 4K cameras) and complexity of DNN models. One promising approach involves criticality-aware processing, where the computation is directed selectively to "critical" portions of individual image frames. We introduce MOSAIC, a novel system for such criticality-aware concurrent processing of multiple vision sensing streams that provides a multiplicative increase in the achievable throughput with negligible loss in perception fidelity. MOSAIC determines critical regions from images received from multiple vision …


Statistical And Biological Analyses Of Acoustic Signals In Estrildid Finches, Moises Rivera 2023 The Graduate Center, City University of New York

Statistical And Biological Analyses Of Acoustic Signals In Estrildid Finches, Moises Rivera

Dissertations, Theses, and Capstone Projects

Acoustic communication is a process that involves auditory perception and signal processing. Discrimination and recognition further require cognitive processes and supporting mechanisms in order to successfully identify and appropriately respond to signal senders. Although acoustic communication is common across birds, classical research has largely disregarded the perceptual abilities of perinatal altricial taxa. Chapter 1 reviews the literature of perinatal acoustic stimulation in birds, highlighting the disproportionate focus on precocial birds (e.g., chickens, ducks, quails). The long-held belief that altricial birds were incapable of acoustic perception in ovo was only recently overturned, as researchers began to find behavioral and physiological evidence …


Data-Optimized Spatial Field Predictions For Robotic Adaptive Sampling: A Gaussian Process Approach, Zachary Nathan 2023 Dartmouth College

Data-Optimized Spatial Field Predictions For Robotic Adaptive Sampling: A Gaussian Process Approach, Zachary Nathan

Computer Science Senior Theses

We introduce a framework that combines Gaussian Process models, robotic sensor measurements, and sampling data to predict spatial fields. In this context, a spatial field refers to the distribution of a variable throughout a specific area, such as temperature or pH variations over the surface of a lake. Whereas existing methods tend to analyze only the particular field(s) of interest, our approach optimizes predictions through the effective use of all available data. We validated our framework on several datasets, showing that errors can decline by up to two-thirds through the inclusion of additional colocated measurements. In support of adaptive sampling, …


Towards Generalizable Machine Learning Models For Computer-Aided Diagnosis In Medicine, Yiyang Wang 2023 DePaul University

Towards Generalizable Machine Learning Models For Computer-Aided Diagnosis In Medicine, Yiyang Wang

College of Computing and Digital Media Dissertations

Hidden stratification represents a phenomenon in which a training dataset contains unlabeled (hidden) subsets of cases that may affect machine learning model performance. Machine learning models that ignore the hidden stratification phenomenon--despite promising overall performance measured as accuracy and sensitivity--often fail at predicting the low prevalence cases, but those cases remain important. In the medical domain, patients with diseases are often less common than healthy patients, and a misdiagnosis of a patient with a disease can have significant clinical impacts. Therefore, to build a robust and trustworthy CAD system and a reliable treatment effect prediction model, we cannot only pursue …


Covid-19 In Casinos: Analysis Of Covid-19 Contamination And Spread With Economic Impact Assessment, Anastasia (Stasi) D. Baran, Jason D. Fiege 2023 nQube Data Science Inc.

Covid-19 In Casinos: Analysis Of Covid-19 Contamination And Spread With Economic Impact Assessment, Anastasia (Stasi) D. Baran, Jason D. Fiege

International Conference on Gambling & Risk Taking

Abstract:

The COVID-19 pandemic caused tremendous disruption for casinos, with the virus causing various lengths of shutdowns, capacity restrictions, and social distancing strategies such as machine removals or section closures. Although most of the world has now eased off these measures, it is important to review lessons learned to understand, and better prepare for similar circumstances in the future. We present Monte Carlo slot floor simulation software customized to simulate players spreading COVID-19 on the slot floor. We simulate the amount of touch surface contamination; the number of potential surface contact exposure events per day, and a proximity exposures statistic …


Digital Commons powered by bepress