Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

1,480 Full-Text Articles 2,957 Authors 435,013 Downloads 189 Institutions

All Articles in Data Science

Faceted Search

1,480 full-text articles. Page 51 of 73.

Sentiment-Oriented Metric Learning For Text-To-Image Retrieval, Quoc Tuan TRUONG, Hady W. LAUW 2021 Singapore Management University

Sentiment-Oriented Metric Learning For Text-To-Image Retrieval, Quoc Tuan Truong, Hady W. Lauw

Research Collection School Of Computing and Information Systems

In this era of multimedia Web, text-to-image retrieval is a critical function of search engines and visually-oriented online platforms. Traditionally, the task primarily deals with matching a text query with the most relevant images available in the corpus. To an increasing extent, the Web also features visual expressions of preferences, imbuing images with sentiments that express those preferences. Cases in point include photos in online reviews as well as social media. In this work, we study the effects of sentiment information on text-to-image retrieval. Particularly, we present two approaches for incorporating sentiment orientation into metric learning for cross-modal retrieval. Each …


A Deep Topical N-Gram Model And Topic Discovery On Covid-19 News And Research Manuscripts, Yuan Du 2021 The University of Western Ontario

A Deep Topical N-Gram Model And Topic Discovery On Covid-19 News And Research Manuscripts, Yuan Du

Electronic Thesis and Dissertation Repository

Topic modeling with the latent semantic analysis (LSA), the latent Dirichlet allocation (LDA) and the biterm topic model (BTM) has been successfully implemented and used in many areas, including movie reviews, recommender systems, and text summarization, etc. However, these models may become computationally intensive if tested on a humongous corpus. Considering the wide acceptance of machine learning based on deep neural networks, this research proposes two deep neural network (NN) variants, 2-layer NN and 3-layer NN of the LDA modeling techniques. The primary goal is to deal with problems with a large corpus using manageable computational resources.

This thesis analyze …


Network-Based Analysis Of Early Pandemic Mitigation Strategies: Solutions, And Future Directions, Pegah Hozhabrierdi, Raymond Zhu, Maduakolam Onyewu, Sucheta Soundarajan 2021 Syracuse University

Network-Based Analysis Of Early Pandemic Mitigation Strategies: Solutions, And Future Directions, Pegah Hozhabrierdi, Raymond Zhu, Maduakolam Onyewu, Sucheta Soundarajan

Northeast Journal of Complex Systems (NEJCS)

Despite the large amount of literature on mitigation strategies for pandemic spread, in practice, we are still limited by naive strategies, such as lockdowns, that are not effective in controlling the spread of the disease in long term. One major reason behind adopting basic strategies in real-world settings is that, in the early stages of a pandemic, we lack knowledge of the behavior of a disease, and so cannot tailor a more sophisticated response. In this study, we design different mitigation strategies for early stages of a pandemic and perform a comprehensive analysis among them. We then propose a novel …


Evaluation Of Parametric And Nonparametric Statistical Models In Wrong-Way Driving Crash Severity Prediction, Sajidur Rahman Nafis 2021 Florida International University

Evaluation Of Parametric And Nonparametric Statistical Models In Wrong-Way Driving Crash Severity Prediction, Sajidur Rahman Nafis

FIU Electronic Theses and Dissertations

Wrong-way driving (WWD) crashes result in more fatalities per crash, involve more vehicles, and cause extended road closures compared to other types of crashes. Although crashes involving wrong-way drivers are relatively few, they often lead to fatalities and serious injuries. Researchers have been using parametric statistical models to identify factors that affect WWD crash severity. However, these parametric models are generally based on several assumptions, and the results could generate numerous errors and become questionable when these assumptions are violated. On the other hand, nonparametric methods such as data mining or machine learning techniques do not use a predetermined functional …


Virtual Network Function Embedding Under Nodal Outage Using Deep Q-Learning, Swarna Bindu Chetty, Hamed Ahmadi, Sachin Sharma, Avishek Nag 2021 University College Dublin

Virtual Network Function Embedding Under Nodal Outage Using Deep Q-Learning, Swarna Bindu Chetty, Hamed Ahmadi, Sachin Sharma, Avishek Nag

Articles

With the emergence of various types of applications such as delay-sensitive applications, future communication networks are expected to be increasingly complex and dynamic. Network Function Virtualization (NFV) provides the necessary support towards efficient management of such complex networks, by virtualizing network functions and placing them on shared commodity servers. However, one of the critical issues in NFV is the resource allocation for the highly complex services; moreover, this problem is classified as an NP-Hard problem. To solve this problem, our work investigates the potential of Deep Reinforcement Learning (DRL) as a swift yet accurate approach (as compared to integer linear …


The Impact Of Twitter On The National Hockey League And Its Players, Benjamin Strauss 2021 Bryant University

The Impact Of Twitter On The National Hockey League And Its Players, Benjamin Strauss

Honors Projects in Data Science

This study offers a new perspective on collecting and analyzing Twitter data surrounding the National Hockey League (NHL) to identify any trends or relationships between the data and overall performance during the 2021 abbreviated season. This paper provides and in-depth analysis by studying a sample of sixty of the top NHL players, specifically those who are typically top performers in the league, spanning over all thirty-one teams and all positions, this study was able to identify a deeper and broader perspective of what implications can be drawn from analyzing data from Twitter to both predict and reflect both individual player …


Three-Way Analysis-Based Ph-Uv-Vis Spectroscopy For Quantifying Allura Red In An Energy Drink And Determining Colorant's Pka, Erdal Dinç Prof., Nazangül Ünal, Zehra Ceren Ertekin 2021 Ankara University

Three-Way Analysis-Based Ph-Uv-Vis Spectroscopy For Quantifying Allura Red In An Energy Drink And Determining Colorant's Pka, Erdal Dinç Prof., Nazangül Ünal, Zehra Ceren Ertekin

Journal of Food and Drug Analysis

Three-way analysis-based pH-UV-Vis spectroscopy was proposed for quantifying allura red in an energy drink product without the need for chromatographic analysis, and determining the colorant’s pKa without using any titration technique. In this study, UV-Vis spectroscopic data matrices were obtained from absorbance measurements at five different pH levels from pH 8 to pH 12 and arranged as a three-way array (wavelength x sample x pH). In the three-way analysis procedure, parallel factor analysis (PARAFAC) was implemented to decompose the three-way array into a set of trilinear components. Each set of three components relates to spectral, pH and relative concentration profiles …


Mass Incarceration In Nebraska: Data And Historical Analysis Of Inmates From 1980-2020, Anna Krause 2021 University of Nebraska - Lincoln

Mass Incarceration In Nebraska: Data And Historical Analysis Of Inmates From 1980-2020, Anna Krause

Honors Theses

This study examines Nebraska Department of Corrections inmate data from 1980-2020, looking specifically at inmate demographics and offense trends. State-of-the-art data analysis is conducted to collect, modify, and visualize the data sources. Inmates are organized by each decade they were incarcerated within. The current active prison population is also examined in their own research group. The demographic and offense trends are compared with previous local and national research. Historical context is given for evolving trends in offenses. Solutions for Nebraska prison overcrowding are presented from various interest groups. This study aims to enlighten all interested Nebraskans on who inhabits their …


Correlating Water Quality And Profile Data In The Florida Keys Using Machine Learning Methods, Alejandro M. Torres Castellanos 2021 Florida International University

Correlating Water Quality And Profile Data In The Florida Keys Using Machine Learning Methods, Alejandro M. Torres Castellanos

FIU Electronic Theses and Dissertations

Water quality is a very active subject of research in the water science field, where its importance includes maintaining the environment, managing wastewater, and securing fresh water. However, the increase of human development has led to problems that are affecting the ecosystem. Motivated by these problems, this research aims to find a solution for understanding the coastal water of the Florida Keys. The research used machine learning methods to find a correlation between water quality dataset and profile measurements dataset. To achieve this objective, the research first went through cleaning, rescuing, and structuring a readable dataset of the profile measurements …


A Consent Framework For The Internet Of Things In The Gdpr Era, Gerald Chikukwa 2021 Dakota State University

A Consent Framework For The Internet Of Things In The Gdpr Era, Gerald Chikukwa

Masters Theses & Doctoral Dissertations

The Internet of Things (IoT) is an environment of connected physical devices and objects that communicate amongst themselves over the internet. The IoT is based on the notion of always-connected customers, which allows businesses to collect large volumes of customer data to give them a competitive edge. Most of the data collected by these IoT devices include personal information, preferences, and behaviors. However, constant connectivity and sharing of data create security and privacy concerns. Laws and regulations like the General Data Protection Regulation (GDPR) of 2016 ensure that customers are protected by providing privacy and security guidelines to businesses. Data …


Node-Independent Method For Gastroenterological Signal Processing Based On Cubic Splines, S.A. Bakhromov 2021 “Bulletin of TUIT: Management and Communication Technologies”

Node-Independent Method For Gastroenterological Signal Processing Based On Cubic Splines, S.A. Bakhromov

Bulletin of TUIT: Management and Communication Technologies

This paper discusses a local cubic spline function built independently of node points using basic functions. the size of the calculations required to find the parameters to be determined during the construction of the spline function does not depend on the number of node points. Local-based splines are used to build such spline functions. Restoration of the gastroenterological signal was performed on the basis of the spline-function model discussed in the article. The result of a cubic spline-function error independent of the node points was compared with the result of the Lagrange classical polynomial error (Table 2).


Contract Information Extraction Using Machine Learning, Zachary E. Butcher 2021 Air Force Institute of Technology

Contract Information Extraction Using Machine Learning, Zachary E. Butcher

Theses and Dissertations

The Air Force Sustainment Center assisted by the Data Analytics Resource Team and the Defense Logistics Agency collected four million contracts onto one of the Air Force Research Laboratory’s high power computers. This thesis focuses on the effort to determine if parts are available through those contracts. Some information is extracted using machine learning in combination with natural language processing. Where machine learning methods are unsuccessful or inappropriate, text mining techniques, such as pattern recognition and rules, are used. Upon completion, the information is combined into a Gantt chart for quick evaluation. Only 21% of the contracts have their information …


Predictive Modeling And Estimation Of The Doubling Time Of Confirmed Cases Of Covid-19 In Niger, Ibrahim Sidi Zakari, Hadiza Galadima 2021 Old Dominion University

Predictive Modeling And Estimation Of The Doubling Time Of Confirmed Cases Of Covid-19 In Niger, Ibrahim Sidi Zakari, Hadiza Galadima

Community & Environmental Health Faculty Publications

Modeling is increasingly used to assess scenarios and make projections on the future course of new coronavirus disease. This allows for better planning of care as well as a relaxation or tightening of the restrictive measures decreed by the government and the health authorities. The data analyzed in this study covers the period from March 19 to June 05, 2020 and allowed predictions of new cases of COVID-19 based on a growth model with a growth rate that changes linearly over time. In addition, we calculated and predicted the doubling time of the number of positive cases in each region …


Node Classification On Relational Graphs Using Deep-Rgcns, Nagasai Chandra 2021 California Polytechnic State University, San Luis Obispo

Node Classification On Relational Graphs Using Deep-Rgcns, Nagasai Chandra

Master's Theses

Knowledge Graphs are fascinating concepts in machine learning as they can hold usefully structured information in the form of entities and their relations. Despite the valuable applications of such graphs, most knowledge bases remain incomplete. This missing information harms downstream applications such as information retrieval and opens a window for research in statistical relational learning tasks such as node classification and link prediction. This work proposes a deep learning framework based on existing relational convolutional (R-GCN) layers to learn on highly multi-relational data characteristic of realistic knowledge graphs for node property classification tasks. We propose a deep and improved variant, …


Clustering Web Users By Mouse Movement To Detect Bots And Botnet Attacks, Justin L. Morgan 2021 California Polytechnic State University, San Luis Obispo

Clustering Web Users By Mouse Movement To Detect Bots And Botnet Attacks, Justin L. Morgan

Master's Theses

The need for website administrators to efficiently and accurately detect the presence of web bots has shown to be a challenging problem. As the sophistication of modern web bots increases, specifically their ability to more closely mimic the behavior of humans, web bot detection schemes are more quickly becoming obsolete by failing to maintain effectiveness. Though machine learning-based detection schemes have been a successful approach to recent implementations, web bots are able to apply similar machine learning tactics to mimic human users, thus bypassing such detection schemes. This work seeks to address the issue of machine learning based bots bypassing …


Jrevealpeg: A Semi-Blind Jpeg Steganalysis Tool Targeting Current Open-Source Embedding Programs, Charles A. Badami 2021 Dakota State University

Jrevealpeg: A Semi-Blind Jpeg Steganalysis Tool Targeting Current Open-Source Embedding Programs, Charles A. Badami

Masters Theses & Doctoral Dissertations

Steganography in computer science refers to the hiding of messages or data within other messages or data; the detection of these hidden messages is called steganalysis. Digital steganography can be used to hide any type of file or data, including text, images, audio, and video inside other text, image, audio, or video data. While steganography can be used to legitimately hide data for non-malicious purposes, it is also frequently used in a malicious manner. This paper proposes JRevealPEG, a software tool written in Python that will aid in the detection of steganography in JPEG images with respect to identifying a …


Explainable Recommendation With Comparative Constraints On Product Aspects, Trung-Hoang LE, Hady W. LAUW 2021 Singapore Management University

Explainable Recommendation With Comparative Constraints On Product Aspects, Trung-Hoang Le, Hady W. Lauw

Research Collection School Of Computing and Information Systems

To aid users in choice-making, explainable recommendation models seek to provide not only accurate recommendations but also accompanying explanations that help to make sense of those recommendations. Most of the previous approaches rely on evaluative explanations, assessing the quality of an individual item along some aspects of interest to the user. In this work, we are interested in comparative explanations, the less studied problem of assessing a recommended item in comparison to another reference item.

In particular, we propose to anchor reference items on the previously adopted items in a user's history. Not only do we aim at providing comparative …


Bilateral Variational Autoencoder For Collaborative Filtering, Quoc Tuan TRUONG, Aghiles SALAH, Hady W. LAUW 2021 Singapore Management University

Bilateral Variational Autoencoder For Collaborative Filtering, Quoc Tuan Truong, Aghiles Salah, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Preference data is a form of dyadic data, with measurements associated with pairs of elements arising from two discrete sets of objects. These are users and items, as well as their interactions, e.g., ratings. We are interested in learning representations for both sets of objects, i.e., users and items, to predict unknown pairwise interactions. Motivated by the recent successes of deep latent variable models, we propose Bilateral Variational Autoencoder (BiVAE), which arises from a combination of a generative model of dyadic data with two inference models, user- and item-based, parameterized by neural networks. Interestingly, our model can take the form …


Introduction To The Mathematical Analysis Of Data Ams 450, Harrison Dekker 2021 University of Rhode Island

Introduction To The Mathematical Analysis Of Data Ams 450, Harrison Dekker

Library Impact Statements

No abstract provided.


Big Data: Ethics, Resources, And Potential Collaboration, Matthew Zook 2021 University of Kentucky

Big Data: Ethics, Resources, And Potential Collaboration, Matthew Zook

Geography Presentations

This presentation goes over 10 simple rules for responsible big data research.


Digital Commons powered by bepress