Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

1,480 Full-Text Articles 2,957 Authors 435,013 Downloads 189 Institutions

All Articles in Data Science

Faceted Search

1,480 full-text articles. Page 50 of 73.

Unsupervised And Supervised Learning For Rna-Protein Interactions And Annotations, Kateland Sipe 2021 Bowling Green State University

Unsupervised And Supervised Learning For Rna-Protein Interactions And Annotations, Kateland Sipe

Honors Projects

This project analyzed the base and amino acid interactions and annotations through the use of unsupervised and supervised learning techniques. For unsupervised learning, clustering found the data was not able to be distinguished into clear groups which matched the original annotations through kmeans clustering and hierarchical clustering. For supervised learning, the use of random forest, glmnet, and deep learning neural networks were successful in creating accurate predictions. However, machine learning likely will not be able to replace the original complex program, but could be used for possible simplification.


382— Wiyn Open Cluster Study: Ubvri Photometry Of Ngc 2204, Kylie Snyder, Dante Scarazzini 2021 SUNY Geneseo

382— Wiyn Open Cluster Study: Ubvri Photometry Of Ngc 2204, Kylie Snyder, Dante Scarazzini

GREAT Day Posters

The purpose of this project was to study the open star cluster NGC2204 using images taken at Kitt Peak National Observatory using the WIYN 0.9m telescope. These images were analyzed photometrically with the intention of determining the reddening, metallicity, age, and distance modulus of the star cluster. Each image was analyzed using software that determined the point spread function and applied that function to determine the magnitude of each star in that image. These magnitudes were taken for each filter, UBVRI, and then combined and averaged to create a single catalog. Standard stars, taken on the same night, were used …


An Exploratory Analysis Of The Bgsu Learning Commons Student Usage Data, Emily Eskuri 2021 Bowling Green State University

An Exploratory Analysis Of The Bgsu Learning Commons Student Usage Data, Emily Eskuri

Honors Projects

The purpose of this study was to explore past student usage data in individualized tutoring sessions from the Learning Commons from two academic years. The Bowling Green State University (BGSU) Learning Commons is a learning assistance center that offers various services, such as individualized tutoring, math assistance, writing assistance, study hours, and academic coaching. There have been limited research studies into how big data and analytics can have an impact in higher education, especially research utilizing predictive analytics.

This project applied analytics to individualized tutoring data in the Learning Commons to create a better understanding of why those trends happen …


Integrating Common Data Analytics Tools Into Non-Technical Undergraduate Curricula, Kurt Kirstein 2021 Central Washington University

Integrating Common Data Analytics Tools Into Non-Technical Undergraduate Curricula, Kurt Kirstein

All Faculty Scholarship for the College of Education and Professional Studies

Aside from statistics courses, accessible data analytics skills are often excluded from traditional non-technical university programs. These are topics that are typically the domain of programs that focus on math, statistics and computer science. Yet the need for these skills in non-technical disciplines is changing. A rapid expansion of data-related processes in organizations of many types requires individuals who have at least a working knowledge of common analytic tools. This article briefly describes three categories of data analytics tools that can be useful for graduates in any discipline. The first category covers descriptive tools that allow students to learn what …


Oit Web App, Stanley Ritsema 2021 Western Michigan University

Oit Web App, Stanley Ritsema

Honors Theses

The goal of this project was to create a web app to assist WMU’s help desk in handling various user issues relating to Office365 and WebEx. The three issues of unblocking email, enabling live-streaming, and changing the URL of a personal meeting room all require administrative access but the OIT department wanted to empower front desk staff to handle such requests. The project was designed using a web server that takes user input from a help desk employee and executes java functions that make API calls. In the end, we were able to successfully create this proof-of-concept prototype for two …


Exploring Ai And Multiplayer In Java, Ronni Kurtzhals 2021 Minnesota State University Moorhead

Exploring Ai And Multiplayer In Java, Ronni Kurtzhals

Student Academic Conference

I conducted research into three topics: artificial intelligence, package deployment, and multiplayer servers in Java. This research came together to form my project presentation on the implementation of these topics, which I felt accurately demonstrated the various things I have learned from my courses at Moorhead State University. Several resources were consulted throughout the project, including the work of W3Schools and StackOverflow as well as relevant assignments and textbooks from previous classes. I found this project relevant to computer science and information systems for several reasons, such as the AI component and use of SQL data tables; but it was …


Automl For Anomaly Detection Of Time Series And Sequences Of Short Text, Cynthia Freeman 2021 University of New Mexico

Automl For Anomaly Detection Of Time Series And Sequences Of Short Text, Cynthia Freeman

Computer Science ETDs

Automated approaches for parameter and algorithm selection greatly democratize fields such as machine learning, saving time and money as hiring experts can be prohibitively expensive. Unfortunately, anomaly detection is difficult to automate due to subjectivity and class imbalance. An anomaly detection system is presented that incorporates human-in-the-loop techniques and is dynamic, scalable, and able to work with non-annotated data. By focusing on meta-features of the input data, the system can intelligently choose the most promising anomaly detection methods. The system is agnostic to the medium of data; it only expects the data to be sequential in nature.


B31: Identifying New G Protein Coupled Receptor Kinase 2 And 3 Substrates Among Proteins Closely Linked To Breast Cancer With Positive Prognosis, Theresa Tran 2021 Roseman University of Health Sciences

B31: Identifying New G Protein Coupled Receptor Kinase 2 And 3 Substrates Among Proteins Closely Linked To Breast Cancer With Positive Prognosis, Theresa Tran

Annual Research Symposium

No abstract provided.


Simulated Contact Tracing Of Covid-19 Propagation At Kutztown University For Fall 2020, Dale E. Parson 2021 Kutztown University

Simulated Contact Tracing Of Covid-19 Propagation At Kutztown University For Fall 2020, Dale E. Parson

Computer Science and Information Technology Faculty

From mid-May through August 2020 the author designed, built, revised, and analyzed resulting data from two simulation programs for virtual contact tracing of COVID-19 infection propagation at Kutztown University in the fall 2020 semester. The first was command-line driven and non-graphical, with results distributed to faculty and administrators on May 28. The second was a three-dimensional interactive graphical simulation, distributed to faculty, administrators, and the public as a narrated video via YouTube on July 16. The algorithm is an adaptation of spreading activation as used in theoretical psychology and artificial intelligence research since the 1970s. It propagates discrete, probable infections …


Geometric Representation Learning, Luke Vilnis 2021 University of Massachusetts Amherst

Geometric Representation Learning, Luke Vilnis

Doctoral Dissertations

Vector embedding models are a cornerstone of modern machine learning methods for knowledge representation and reasoning. These methods aim to turn semantic questions into geometric questions by learning representations of concepts and other domain objects in a lower-dimensional vector space. In that spirit, this work advocates for density- and region-based representation learning. Embedding domain elements as geometric objects beyond a single point enables us to naturally represent breadth and polysemy, make asymmetric comparisons, answer complex queries, and provides a strong inductive bias when labeled data is scarce. We present a model for word representation using Gaussian densities, enabling asymmetric entailment …


The Agnostic Structure Of Data Science Methods, Domenico Napoletani, Marco Panza, Daniele Struppa 2021 Chapman University

The Agnostic Structure Of Data Science Methods, Domenico Napoletani, Marco Panza, Daniele Struppa

MPP Published Research

In this paper we argue that data science is a coherent and novel approach to empirical problems that, in its most general form, does not build understanding about phenomena. Within the new type of mathematization at work in data science, mathematical methods are not selected because of any relevance for a problem at hand; mathematical methods are applied to a specific problem only by `forcing’, i.e. on the basis of their ability to reorganize the data for further analysis and the intrinsic richness of their mathematical structure. In particular, we argue that deep learning neural networks are best understood within …


Netsci High: Bringing Agency To Diverse Teens Through The Science Of Connected Systems, Stephen M. Uzzo, Catherine B. Cramer, Hiroki Sayama, Russell Faux 2021 New York Hall of Science

Netsci High: Bringing Agency To Diverse Teens Through The Science Of Connected Systems, Stephen M. Uzzo, Catherine B. Cramer, Hiroki Sayama, Russell Faux

Northeast Journal of Complex Systems (NEJCS)

This paper follows NetSci High, a decade-long initiative to inspire teams of teenage researchers to develop, execute and disseminate original research in network science. The project introduced high school students to the computer-based analysis of networks, and instilled in the participants the habits of mind to deepen inquiry in connected systems and statistics, and to sustain interest in continuing to study and pursue careers in fields involving network analysis. Goals of NetSci High ranged from proximal learning outcomes (e.g., increasing high school student competencies in computing and improving student attitudes toward computing) to highly distal (e.g., preparing students for 21st …


The Role Of Privacy Within The Realm Of Healthcare Wearables' Acceptance And Use, Thomas Jernejcic 2021 Dakota State University

The Role Of Privacy Within The Realm Of Healthcare Wearables' Acceptance And Use, Thomas Jernejcic

Masters Theses & Doctoral Dissertations

The flexibility and vitality of the Internet along with technological innovation have fueled an industry focused on the design of portable devices capable of supporting personal activities and wellbeing. These compute devices, known as wearables, are unique from other computers in that they are portable, specific in function, and worn or carried by the user. While there are definite benefits attributable to wearables, there are also notable risks, especially in the realm of security where personal information and/or activities are often accessible to third parties. In addition, protecting one’s private information is regularly an afterthought and thus lacking in maturity. …


Public Discourse Against Masks In The Covid-19 Era: Infodemiology Study Of Twitter Data, Mohammad A. Al-Ramahi, Ahmed El Noshokaty, Omar El-Gayar, Tareq Nasralah, Abdullah Wahbeh 2021 Texas A&M University-San Antonio

Public Discourse Against Masks In The Covid-19 Era: Infodemiology Study Of Twitter Data, Mohammad A. Al-Ramahi, Ahmed El Noshokaty, Omar El-Gayar, Tareq Nasralah, Abdullah Wahbeh

Computer Information Systems Faculty Publications

Background:

Despite scientific evidence supporting the importance of wearing masks to curtail the spread of COVID-19, wearing masks has stirred up a significant debate particularly on social media.

Objective:

This study aimed to investigate the topics associated with the public discourse against wearing masks in the United States. We also studied the relationship between the anti-mask discourse on social media and the number of new COVID-19 cases.

Methods:

We collected a total of 51,170 English tweets between January 1, 2020, and October 27, 2020, by searching for hashtags against wearing masks. We used machine learning techniques to analyze the data …


Interrupting The Propaganda Supply Chain, Kyle Hamilton, Bojan Bozic, Luc Longo 2021 Technological University Dublin

Interrupting The Propaganda Supply Chain, Kyle Hamilton, Bojan Bozic, Luc Longo

Conference papers

In this early-stage research, a multidisciplinary approach is presented for the detection of propaganda in the media, and for modeling the spread of propaganda and disinformation using semantic web and graph theory. An ontology will be designed which has the theoretical underpinnings from multiple disciplines including the social sciences and epidemiology. An additional objective of this work is to automate triple extraction from unstructured text which surpasses the state-of-the-art performance.


An Analysis Of The Interpretability Of Neural Networks Trained On Magnetic Resonance Imaging For Stroke Outcome Prediction, Esra Zihni, John D. Kelleher, Bryony McGarry 2021 Technological University Dublin

An Analysis Of The Interpretability Of Neural Networks Trained On Magnetic Resonance Imaging For Stroke Outcome Prediction, Esra Zihni, John D. Kelleher, Bryony Mcgarry

Conference papers

Applying deep learning models to MRI scans of acute stroke patients to extract features that are indicative of short-term outcome could assist a clinician’s treatment decisions. Deep learning models are usually accurate but are not easily interpretable. Here, we trained a convolutional neural network on ADC maps from hyperacute ischaemic stroke patients for prediction of short-term functional outcome and used an interpretability technique to highlight regions in the ADC maps that were most important in the prediction of a bad outcome. Although highly accurate, the model’s predictions were not based on aspects of the ADC maps related to stroke pathophysiology.


Modeling The Stock Market Through Game Theory, Kylie Hannafey 2021 Georgia Southern University

Modeling The Stock Market Through Game Theory, Kylie Hannafey

Honors College Theses

Game Theory is used on many occasions to help us understand interactions between decision-makers. The famous Nash equilibrium is a steady state in a model that shows the interaction of different players, in which no player can do better by choosing a different action if the actions of the other players do not change. These two concepts can be applied to numerous situations that vary in types of players, but for our research, we are focusing on businesses in the stock market. The main objective is to use Game Theory to analyze data collected from the stock market, model our …


Data-Limited Domain Adaptation And Transfer Learning For Learning Latent Expression Labels Of Child Facial Expression Images, Megan Witherow, Winston Shields, Manar Samad, Khan Iftekharuddin 2021 Old Dominion University

Data-Limited Domain Adaptation And Transfer Learning For Learning Latent Expression Labels Of Child Facial Expression Images, Megan Witherow, Winston Shields, Manar Samad, Khan Iftekharuddin

College of Engineering & Technology (Batten) Posters

While state-of-the-art deep learning models have demonstrated success in adult facial expression classification by leveraging large, labeled datasets, labeled data for child facial expression classification is limited. Due to differences in facial morphology and development in child and adult faces, deep learning models trained on adult data do not generalize well to child data. Recent deep domain adaptation approaches have improved the generalizability of models trained on a source domain to a target domain with few labeled samples. We propose that incorporating steps of deep transfer learning, e.g. weights initialization from the pre-trained source model and freezing model layers, may …


Collections As Data At Florida International University, Jamie Rogers 2021 Florida International University

Collections As Data At Florida International University, Jamie Rogers

Works of the FIU Libraries

This presentation provides an overview of the concept of collections as data; shares information about our "dLOC as Data" grant initiative, a collaboration between the Digital Library of the Caribbean (dLOC), the Florida International University (FIU) Libraries Digital Collections Center, and the University of Florida Libraries, funded by the Mellon sub-award program, "Collections as Data: Part to Whole" ; as well as provides an opportunity to talk about how we can share more collections as data resources and undertake new and exciting projects at FIU.

Although the concept of collections as data isn't new, it is becoming more mainstream. As …


Predicting The Outcome Of Nba Games, Matthew Houde 2021 Bryant University

Predicting The Outcome Of Nba Games, Matthew Houde

Honors Projects in Data Science

The aim of the project is to create a machine learning model to predict NBA games. The purpose is to build upon and improve existing models. Research into other predictive sports models and machine learning techniques was conducted to understand what is currently being done to predict NBA games and how effective it is in doing so. After a thorough literary review, the model was created using Python and a variety of machine learning techniques. The dataset used had an array of team statistics for both the home and away team for each corresponding matchup and two supporting features were …


Digital Commons powered by bepress