Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

324 Full-Text Articles 820 Authors 26,810 Downloads 87 Institutions

All Articles in Data Science

Faceted Search

324 full-text articles. Page 1 of 17.

Learn Biologically Meaningful Representation With Transfer Learning, Di He 2021 City University of New York (CUNY)

Learn Biologically Meaningful Representation With Transfer Learning, Di He

Dissertations, Theses, and Capstone Projects

Machine learning has made significant contributions to bioinformatics and computational biol­ogy. In particular, supervised learning approaches have been widely used in solving problems such as bio­marker identification, drug response prediction, and so on. However, because of the limited availability of comprehensively labeled and clean data, constructing predictive models in super­ vised settings is not always desirable or possible, especially when using data­hunger, red­hot learning paradigms such as deep learning methods. Hence, there are urgent needs to develop new approaches that could leverage more readily available unlabeled data in driving successful machine learning ap­ plications in this ...


An Empirical Study Of Refactorings And Technical Debt In Machine Learning Systems, Yiming Tang, Raffi T. Khatchadourian, Mehdi Bagherzadeh, Rhia Singh, Ajani Stewart, Anita Raja 2021 CUNY Graduate Center

An Empirical Study Of Refactorings And Technical Debt In Machine Learning Systems, Yiming Tang, Raffi T. Khatchadourian, Mehdi Bagherzadeh, Rhia Singh, Ajani Stewart, Anita Raja

Publications and Research

Machine Learning (ML), including Deep Learning (DL), systems, i.e., those with ML capabilities, are pervasive in today’s data-driven society. Such systems are complex; they are comprised of ML models and many subsystems that support learning processes. As with other complex systems, ML systems are prone to classic technical debt issues, especially when such systems are long-lived, but they also exhibit debt specific to these systems. Unfortunately, there is a gap of knowledge in how ML systems actually evolve and are maintained. In this paper, we fill this gap by studying refactorings, i.e., source-to-source semantics-preserving program transformations, performed ...


Using Machine Learning Methods To Predict The Movement Trajectories Of The Louisiana Black Bear, Daniel Clark, David Shaw, Armando Vela, Shane Weinstock, John Santerre, Joseph D. Clark 2021 Southern Methodist University

Using Machine Learning Methods To Predict The Movement Trajectories Of The Louisiana Black Bear, Daniel Clark, David Shaw, Armando Vela, Shane Weinstock, John Santerre, Joseph D. Clark

SMU Data Science Review

In 1992, the Louisiana black bear (Ursus americanus luteolus) was placed on the U.S. Endangered Species List. This was due to bear populations in Louisiana being small and isolated enough where their populations couldn’t intersect with other populations to grow. Interchange of individuals between subpopulations of bears in Louisiana is critical to maintain genetic diversity and avoid inbreeding effects. Utilizing GPS (Global Positioning System) data gathered from 31 radio-collared bears from 2010 through 2012, this research will investigate how bears traverse the landscape, which has implications for gene exchange. This paper will leverage machine learning tools to improve ...


Analyzing Empirical Quality Metrics Of Deep Learning Models For Antimicrobial Resistance, Huy H. Nguyen, Sanjay Pillay, Allison Roderick, Hao Wang, John Santerre 2021 Southern Methodist University

Analyzing Empirical Quality Metrics Of Deep Learning Models For Antimicrobial Resistance, Huy H. Nguyen, Sanjay Pillay, Allison Roderick, Hao Wang, John Santerre

SMU Data Science Review

Antimicrobial Resistance (AMR) is a growing concern in the medical field. Over-prescription of antibiotics as well as bacterial mutations have caused some once lifesaving drugs to become ineffective against bacteria. However, the problem of AMR might be addressed using Machine Learning (ML) thanks to increased availability of genomic data and large computing resources. The Pathosystems Resource Integration Center (PATRIC) has genomic data of various bacterial genera with sample isolates that are either resistant or susceptible to certain antibiotics. Past research has used this database to use ML algorithms to model AMR with successful results, including accuracies over 80%. To better ...


Introducing Reproducibility To Citation Analysis: A Case Study In The Earth Sciences, Samantha Teplitzky, Wynn Tranfield, Mea Warren, Philip White 2021 University of California, Berkeley

Introducing Reproducibility To Citation Analysis: A Case Study In The Earth Sciences, Samantha Teplitzky, Wynn Tranfield, Mea Warren, Philip White

Journal of eScience Librarianship

Objectives:

  • Replicate methods from a 2019 study of Earth Science researcher citation practices.
  • Calculate programmatically whether researchers in Earth Science rely on a smaller subset of literature than estimated by the 80/20 rule.
  • Determine whether these reproducible citation analysis methods can be used to analyze open access uptake.

Methods: Replicated methods of a prior citation study provide an updated transparent, reproducible citation analysis protocol that can be replicated with Jupyter Notebooks.

Results: This study replicated the prior citation study’s conclusions, and also adapted the author’s methods to analyze the citation practices of Earth Scientists at four institutions ...


Analysis Of Individual Player Performances And Their Effect On Winning In College Soccer, Angelo Bravo, Thomas Karba, Sean McWhirter, Billy Nayden 2021 Southern Methodist University

Analysis Of Individual Player Performances And Their Effect On Winning In College Soccer, Angelo Bravo, Thomas Karba, Sean Mcwhirter, Billy Nayden

SMU Data Science Review

This study describes the process of modernizing the approach of the Southern Methodist University (SMU) Men's Soccer coaching staff through the use of location and tracking data from their matches in the 2019 season. This study utilizes a variety of modeling and analysis techniques to explore and categorize the data and use it to evaluate the types of plays that are most often correlated with victories. This study's contribution to college soccer analytics includes the implementation of a model to determine individual players' performance, the production of team-level metrics, and visualizations to increase the efficiency of the coaching ...


Machine Learning In The Health Industry: Predicting Congestive Heart Failure And Impactors, Alexandra Norman, James Harding, Daria Zhukova 2021 Southern Methodist University

Machine Learning In The Health Industry: Predicting Congestive Heart Failure And Impactors, Alexandra Norman, James Harding, Daria Zhukova

SMU Data Science Review

Cardiovascular diseases, Congestive Heart Failure in particular, are a leading cause of deaths worldwide. Congestive Heart Failure has high mortality and morbidity rates. The key to decreasing the morbidity and mortality rates associated with Congestive Heart Failure is determining a method to detect high-risk individuals prior to the development of this often-fatal disease. Providing high-risk individuals with advanced knowledge of risk factors that could potentially lead to Congestive Heart Failure, enhances the likelihood of preventing the disease through implementation of lifestyle changes for healthy living. When dealing with healthcare and patient data, there are restrictions that led to difficulties accessing ...


Generating And Smoothing Handwriting With Long Short-Term Memory Networks, muchigi kimari, Edward Fry, Ikenna Nwaogu, YuMei Bennett, John Santerre 2021 Southern Methodist University

Generating And Smoothing Handwriting With Long Short-Term Memory Networks, Muchigi Kimari, Edward Fry, Ikenna Nwaogu, Yumei Bennett, John Santerre

SMU Data Science Review

This project explores the different neural network methods to generate synthetic handwriting text. The goal is to offer an AI tool that generates handwriting, while maintaining an individual’s style, to people suffering with Dysgraphia. As part of this project, an application development framework is setup on GitHub, in such a way that others can continue to explore and improve the AI tool.


A Machine Learning Method Of Determining Causal Inference Applied To Shifts In Voting Preferences Between 2012-2016, Jaclyn A. Coate, Reagan Meagher, Megan Riley, John Santerre 2021 Southern Methodist University

A Machine Learning Method Of Determining Causal Inference Applied To Shifts In Voting Preferences Between 2012-2016, Jaclyn A. Coate, Reagan Meagher, Megan Riley, John Santerre

SMU Data Science Review

This research investigates the application of machine learning techniques to assist in the execution of a synthetic control model. This model was performed to analyze counties within the United States that showed a voter shift from a majority of Democratic voter share to Republican between the 2012 and 2016 election cycles. The following study applies two steps of machine learning analysis. The first, which is the treatment discovery process, leverages a Random Forest to evaluate feature importance. The second step was the execution of the synthetic control model with two predictor variable lists. The first was the parametric method: a ...


Automated Analysis Of Rfps Using Natural Language Processing (Nlp) For The Technology Domain, Sterling Beason, William Hinton, Yousri A. Salamah, Jordan Salsman 2021 Southern Methodist University

Automated Analysis Of Rfps Using Natural Language Processing (Nlp) For The Technology Domain, Sterling Beason, William Hinton, Yousri A. Salamah, Jordan Salsman

SMU Data Science Review

Much progress has been made in text analysis, specifically within the statistical domain of Term Frequency (TF) and Inverse Document Frequency (IDF). However, there is much room for improvement especially within the area of discovering Emerging Trends. Emerging Trend Detection Systems (ETDS) depend on ingesting a collection of textual data and TF/IDF to identify new or up-trending topics within the Corpus. However, the tremendous rate of change and the amount of digital information presents a challenge that makes it almost impossible for a human expert to spot emerging trends without relying on an automated ETD system. Since the U ...


The Social Market Economy As A Formula For Peace, Prosperity, And Sustainability, Almuth D. Merkel 2021 Kennesaw State University

The Social Market Economy As A Formula For Peace, Prosperity, And Sustainability, Almuth D. Merkel

Doctor of International Conflict Management Dissertations

The social market economy was developed in Germany during the interwar period amidst political and economic turmoil. With clear demarcation lines differentiating it from socialism and laissez-faire capitalism, the social market economy became a formula for peace and prosperity for post WWII Germany. Since then, the success of the social market economy has inspired many other countries to adopt its principles. Drawing on evidence from economic history and the history of economic thought, this thesis first reviews the evolution of the fundamental principles that form the foundation of social-market economic thought. Blending the micro-economic utility maximization framework with traditional growth ...


Book Review: Data Feminism, Katherine A. Mika 2021 Harvard University

Book Review: Data Feminism, Katherine A. Mika

Journal of eScience Librarianship

Book review of: Data Feminism by Catherine D'Ignazio and Lauren F. Klein, The MIT Press (2020). Data Feminism combines intersectional feminism and critical data studies to invite the reader to consider: “How can we use data to remake the world?” As non-profit organizations with a mandate to provide equitable access to non-neutral information and services, libraries and library workers are uniquely positioned to advance the principles laid out in Data Feminism.


Research Focus: Pattern Recognition, 2021 DePaul University

Research Focus: Pattern Recognition

In The Loop

A CDM health informatics team joins a global race to advance COVID-19 diagnostics through X-ray insights.


Application Of Randomness In Finance, Jose Sanchez, Daanial Ahmad, Satyanand Singh 2021 CUNY New York City College of Technology

Application Of Randomness In Finance, Jose Sanchez, Daanial Ahmad, Satyanand Singh

Publications and Research

Brownian Motion which is also considered to be a Wiener process and can be thought of as a random walk. In our project we had briefly discussed the fluctuations of financial indices and related it to Brownian Motion and the modeling of Stock prices.


Spring 2021, 2021 DePaul University

Spring 2021

In The Loop

IRL Programs Debut; Short & Sweet Pandemic Film Fest; New MS in Artificial Intelligence; Virtual Experts Talks; DePaul Trustee Producing Documentary; DemonHacks Hackathon; Silicon Valley 2.0: The DePaul Innovation Development Lab connects students and companies to spark solutions to technological challenges; Code Warrior: Ovetta Sampson has risen to challenges in digital design, journalism and athletics while inspiring others; Pattern Recognition: A CDM health informatics team joins a global race to advance COVID-19 diagnostics through X-ray insights


Using Deep Learning For Children Brain Image Analysis, Rafael Toche Pizano 2021 University of Arkansas, Fayetteville

Using Deep Learning For Children Brain Image Analysis, Rafael Toche Pizano

Computer Science and Computer Engineering Undergraduate Honors Theses

Analyzing the correlation between brain volumetric/morphometry features and cognition/behavior in children is important in the field of pediatrics as identifying such relationships can help identify children who may be at risk for illnesses. Understanding these relationships can not only help identify children who may be at risk of illnesses, but it can also help evaluate strategies that promote brain development in children. Currently, one way to do this is to use traditional statistical methods such as a correlation analysis, but such an approach does not make it easy to generalize and predict how brain volumetric/morphometry will impact ...


Thruster Communication For Subsurface Environments; Turning Waste Noise Into Useful Data, Stephen Cronin 2021 Embry-Riddle Aeronautical University

Thruster Communication For Subsurface Environments; Turning Waste Noise Into Useful Data, Stephen Cronin

PhD Dissertations and Master's Theses

Acoustic communication serves as one of the primary means of wirelessly communicating underwater. Whereas much of the developments in the field of wireless communication have focused on radio frequency technology, water highly absorbs radio waves rendering the link not feasible for most all subsurface operations. While acoustic links have enabled new capabilities for systems operating in this challenging environment, it has yet to reach the commodity availability of radio systems, meaning that an entire class of small, low-cost systems have been unable to make use of these links. The systems in question are primarily autonomous underwater vehicles (AUVs), as they ...


Using Large Pre-Trained Language Models To Track Emotions Of Cancer Patients On Twitter, Will Baker 2021 University of Arkansas, Fayetteville

Using Large Pre-Trained Language Models To Track Emotions Of Cancer Patients On Twitter, Will Baker

Computer Science and Computer Engineering Undergraduate Honors Theses

Twitter is a microblogging website where any user can publicly release a message, called a tweet, expressing their feelings about current events or their own lives. This candid, unfiltered feedback is valuable in the spaces of healthcare and public health communications, where it may be difficult for cancer patients to divulge personal information to healthcare teams, and randomly selected patients may decline participation in surveys about their experiences. In this thesis, BERTweet, a state-of-the-art natural language processing (NLP) model, was used to predict sentiment and emotion labels for cancer-related tweets collected in 2019 and 2020. In longitudinal plots, trends in ...


Improving Bayesian Graph Convolutional Networks Using Markov Chain Monte Carlo Graph Sampling, Aneesh Komanduri 2021 University of Arkansas, Fayetteville

Improving Bayesian Graph Convolutional Networks Using Markov Chain Monte Carlo Graph Sampling, Aneesh Komanduri

Computer Science and Computer Engineering Undergraduate Honors Theses

In the modern age of social media and networks, graph representations of real-world phenomena have become incredibly crucial. Often, we are interested in understanding how entities in a graph are interconnected. Graph Neural Networks (GNNs) have proven to be a very useful tool in a variety of graph learning tasks including node classification, link prediction, and edge classification. However, in most of these tasks, the graph data we are working with may be noisy and may contain spurious edges. That is, there is a lot of uncertainty associated with the underlying graph structure. Recent approaches to modeling uncertainty have been ...


Visual Analysis Of Historical Lessons Learned During Exercises For The United States Air Force Europe (Usafe), Samantha O'Rourke 2021 University of Nebraska at Omaha

Visual Analysis Of Historical Lessons Learned During Exercises For The United States Air Force Europe (Usafe), Samantha O'Rourke

Theses/Capstones/Creative Projects

Within the United States Air Force, there are repeated patterns of differences observed during exercises. After an exercise is completed, forms are filled out detailing observations, successes, and recommendations seen throughout the exercise. At the most, no two reports are identical and must be analyzed by personnel and then categorized based on common themes observed. Developing a computer application will greatly reduce the time and resources used to analyze each After Action Report. This application can visually represent these observations and optimize the effectiveness of these exercises. The visualization is done through graphs displaying the frequency of observations and recommendations ...


Digital Commons powered by bepress