Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Science

2020

Institution
Keyword
Publication
Publication Type

Articles 1 - 29 of 29

Full-Text Articles in Databases and Information Systems

Data: The Good, The Bad And The Ethical, John D. Kelleher, Filipe Cabral Pinto, Luis M. Cortesao Dec 2020

Data: The Good, The Bad And The Ethical, John D. Kelleher, Filipe Cabral Pinto, Luis M. Cortesao

Articles

It is often the case with new technologies that it is very hard to predict their long-term impacts and as a result, although new technology may be beneficial in the short term, it can still cause problems in the longer term. This is what happened with oil by-products in different areas: the use of plastic as a disposable material did not take into account the hundreds of years necessary for its decomposition and its related long-term environmental damage. Data is said to be the new oil. The message to be conveyed is associated with its intrinsic value. But as in …


Making Sense Of Online Public Health Debates With Visual Analytics Systems, Anton Ninkov Nov 2020

Making Sense Of Online Public Health Debates With Visual Analytics Systems, Anton Ninkov

Electronic Thesis and Dissertation Repository

Online debates occur frequently and on a wide variety of topics. Particularly, online debates about various public health topics (e.g., vaccines, statins, cannabis, dieting plans) are prevalent in today’s society. These debates are important because of the real-world implications they can have on public health. Therefore, it is important for public health stakeholders (i.e., those with a vested interest in public health) and the general public to have the ability to make sense of these debates quickly and effectively. This dissertation investigates ways of enabling sense-making of these debates with the use of visual analytics systems (VASes). VASes are computational …


An Analysis Of Technological Components In Relation To Privacy In A Smart City, Kayla Rutherford, Ben Lands, A. J. Stiles Nov 2020

An Analysis Of Technological Components In Relation To Privacy In A Smart City, Kayla Rutherford, Ben Lands, A. J. Stiles

James Madison Undergraduate Research Journal (JMURJ)

A smart city is an interconnection of technological components that store, process, and wirelessly transmit information to enhance the efficiency of applications and the individuals who use those applications. Over the course of the 21st century, it is expected that an overwhelming majority of the world’s population will live in urban areas and that the number of wireless devices will increase. The resulting increase in wireless data transmission means that the privacy of data will be increasingly at risk. This paper uses a holistic problem-solving approach to evaluate the security challenges posed by the technological components that make up a …


Espade: An Efficient And Semantically Secure Shortest Path Discovery For Outsourced Location-Based Services, Bharath K. Samanthula, Divyadharshini Karthikeyan, Boxiang Dong, K. Anitha Kumari Oct 2020

Espade: An Efficient And Semantically Secure Shortest Path Discovery For Outsourced Location-Based Services, Bharath K. Samanthula, Divyadharshini Karthikeyan, Boxiang Dong, K. Anitha Kumari

Department of Computer Science Faculty Scholarship and Creative Works

With the rapid growth of smart devices and technological advancements in tracking geospatial data, the demand for Location-Based Services (LBS) is facing a constant rise in several domains, including military, healthcare and transportation. It is a natural step to migrate LBS to a cloud environment to achieve on-demand scalability and increased resiliency. Nonetheless, outsourcing sensitive location data to a third-party cloud provider raises a host of privacy concerns as the data owners have reduced visibility and control over the outsourced data. In this paper, we consider outsourced LBS where users want to retrieve map directions without disclosing their location information. …


Extraction D’Information À Partir Des Sites Web En Arabe Basée Sur Une Méthode À Base Des Règles, Moustafa Alhajj, Amani Sabra Oct 2020

Extraction D’Information À Partir Des Sites Web En Arabe Basée Sur Une Méthode À Base Des Règles, Moustafa Alhajj, Amani Sabra

Al Jinan الجنان

Cet article décrit un outil qui se sert de l’ingénierie de la langue pour l’extraction d’information à partir des sites web en arabe, Ces informations serviront aux documentalistes du Web poue créer des fches d’archivage pour les sites. Une fche d’archivage est proposée, l’objectif étant de remplir cette fche automatiquement. Pour la reconnaissance et la classifcation des segments textuels, la méthode d’exploration contextuelle proposée par Descles est utilisée, les marqueurs et règles linguistiques sont défnis en se basant sur une étude synthétique des spécifcités de la langue arabe. Un corpus de plus de 1300 sites Web en langue arabe a …


Automated Discussion Analysis - Framework For Knowledge Analysis From Class Discussions, Swapna Gottipati, Venky Shankararaman, Mallikan Gokarn Nitin Oct 2020

Automated Discussion Analysis - Framework For Knowledge Analysis From Class Discussions, Swapna Gottipati, Venky Shankararaman, Mallikan Gokarn Nitin

Research Collection School Of Computing and Information Systems

This research full paper, describes knowledge management of class discussions using an analytics based framework. Discussions, either live classroom or through online forums, when used as a teaching method can help stimulate critical thinking. It allows the teacher to explore in-depth the key concepts covered in the course, motivates students to articulate their ideas clearly and challenge the students to think more deeply. Analysing the discussions helps instructors gain better insights on the personal and collaborative learning behaviour of students. However, knowledge from in-class discussions and online forums is not effectively captured and mined due to lack of appropriate automated …


Visual Sentiment Analysis For Review Images With Item-Oriented And User-Oriented Cnn: Reproducibility Companion Paper, Quoc Tuan Truong, Hady W. Lauw, Martin Aumuller, Naoko Nitta Oct 2020

Visual Sentiment Analysis For Review Images With Item-Oriented And User-Oriented Cnn: Reproducibility Companion Paper, Quoc Tuan Truong, Hady W. Lauw, Martin Aumuller, Naoko Nitta

Research Collection School Of Computing and Information Systems

We revisit our contributions on visual sentiment analysis for online review images published at ACM Multimedia 2017, where we develop item-oriented and user-oriented convolutional neural networks that better capture the interaction of image features with specific expressions of users or items. In this work, we outline the experimental claims as well as describe the procedures to reproduce the results therein. In addition, we provide artifacts including data sets and code to replicate the experiments.


Cover Song Identification - A Novel Stem-Based Approach To Improve Song-To-Song Similarity Measurements, Lavonnia Newman, Dhyan Shah, Chandler Vaughn, Faizan Javed Sep 2020

Cover Song Identification - A Novel Stem-Based Approach To Improve Song-To-Song Similarity Measurements, Lavonnia Newman, Dhyan Shah, Chandler Vaughn, Faizan Javed

SMU Data Science Review

Music is incorporated into our daily lives whether intentional or unintentional. It evokes responses and behavior so much so there is an entire study dedicated to the psychology of music. Music creates the mood for dancing, exercising, creative thought or even relaxation. It is a powerful tool that can be used in various venues and through advertisements to influence and guide human reactions. Music is also often "borrowed" in the industry today. The practices of sampling and remixing music in the digital age have made cover song identification an active area of research. While most of this research is focused …


Machine Learning Applications For Drug Repurposing, Hansaim Lim Sep 2020

Machine Learning Applications For Drug Repurposing, Hansaim Lim

Dissertations, Theses, and Capstone Projects

The cost of bringing a drug to market is astounding and the failure rate is intimidating. Drug discovery has been of limited success under the conventional reductionist model of one-drug-one-gene-one-disease paradigm, where a single disease-associated gene is identified and a molecular binder to the specific target is subsequently designed. Under the simplistic paradigm of drug discovery, a drug molecule is assumed to interact only with the intended on-target. However, small molecular drugs often interact with multiple targets, and those off-target interactions are not considered under the conventional paradigm. As a result, drug-induced side effects and adverse reactions are often neglected …


Blockchain Technology And Freight Forwarder Exploration Of Implications Focused On Practitioners In Shanghai, Johannes Van Bohemen Aug 2020

Blockchain Technology And Freight Forwarder Exploration Of Implications Focused On Practitioners In Shanghai, Johannes Van Bohemen

World Maritime University Dissertations

No abstract provided.


Multi‑View Clustering For Multi‑Omics Data Using Unifed Embedding, Mohammed Hasanuzzaman, Sayantan Mitra, Sriparna Saha Aug 2020

Multi‑View Clustering For Multi‑Omics Data Using Unifed Embedding, Mohammed Hasanuzzaman, Sayantan Mitra, Sriparna Saha

Department of Computer Science Publications

In real world applications, data sets are often comprised of multiple views, which provide consensus and complementary information to each other. Embedding learning is an effective strategy for nearest neighbour search and dimensionality reduction in large data sets. This paper attempts to learn a unified probability distribution of the points across different views and generates a unified embedding in a low-dimensional space to optimally preserve neighbourhood identity. Probability distributions generated for each point for each view are combined by conflation method to create a single unified distribution. The goal is to approximate this unified distribution as much as possible when …


Colleague To Banner Migration: Data Conversion Guide For Institutional Research, Laura Osborn Aug 2020

Colleague To Banner Migration: Data Conversion Guide For Institutional Research, Laura Osborn

Masters Theses & Doctoral Dissertations

When the SDBOR decided to migrate their current student information system into a shared system with HR and Finance, adjustments needed to be made to accommodate for current Banner settings and work around tables that were already populated with HRFIS data. The change in data type of the student identifier from that of a 7-digit numeric field to a 9- digit alpha-numeric field poses problems for running aggregate data calculations. Additional complications include having some information such as first-generation status that was not migrated between the systems, and cases such as college coding where tables that were designed for student …


Learning Transferrable Parameters For Long-Tailed Sequential User Behavior Modeling, Jianwen Yin, Chenghao Liu, Weiqing Wang, Jianling Sun, Steven C. H. Hoi Aug 2020

Learning Transferrable Parameters For Long-Tailed Sequential User Behavior Modeling, Jianwen Yin, Chenghao Liu, Weiqing Wang, Jianling Sun, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Sequential user behavior modeling plays a crucial role in online user-oriented services, such as product purchasing, news feed consumption, and online advertising. The performance of sequential modeling heavily depends on the scale and quality of historical behaviors. However, the number of user behaviors inherently follows a long-tailed distribution, which has been seldom explored. In this work, we argue that focusing on tail users could bring more benefits and address the long tails issue by learning transferrable parameters from both optimization and feature perspectives. Specifically, we propose a gradient alignment optimizer and adopt an adversarial training scheme to facilitate knowledge transfer …


A Unified Framework For Sparse Online Learning, Peilin Zhao, Dayong Wong, Pengcheng Wu, Steven C. H. Hoi Aug 2020

A Unified Framework For Sparse Online Learning, Peilin Zhao, Dayong Wong, Pengcheng Wu, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

The amount of data in our society has been exploding in the era of big data. This article aims to address several open challenges in big data stream classification. Many existing studies in data mining literature follow the batch learning setting, which suffers from low efficiency and poor scalability. To tackle these challenges, we investigate a unified online learning framework for the big data stream classification task. Different from the existing online data stream classification techniques, we propose a unified Sparse Online Classification (SOC) framework. Based on SOC, we derive a second-order online learning algorithm and a cost-sensitive sparse online …


Maia And Admonita: Mandatory Integrity Control Language And Dynamic Trust Framework For Arbitrary Structured Data, Wassnaa Al-Mawee Aug 2020

Maia And Admonita: Mandatory Integrity Control Language And Dynamic Trust Framework For Arbitrary Structured Data, Wassnaa Al-Mawee

Dissertations

The expansion of attacks against information systems of companies that operate nuclear power stations and other energy facilities in the United States and other countries, are noticeable with potential catastrophic real-world implications. Data integrity is a fundamental component of information security. It refers to the accuracy and the trustworthiness of data or resources. Data integrity within information systems becomes an important factor of security protection as the data becomes more integrated and crucial to decision-making. The security threats brought by human errors whether, malicious or unintentional, such as viruses, hacking, and many other cybersecurity threats, are dangerous and require mandatory …


Visual Analytics Of Electronic Health Records With A Focus On Acute Kidney Injury, Sheikh S. Abdullah Jul 2020

Visual Analytics Of Electronic Health Records With A Focus On Acute Kidney Injury, Sheikh S. Abdullah

Electronic Thesis and Dissertation Repository

The increasing use of electronic platforms in healthcare has resulted in the generation of unprecedented amounts of data in recent years. The amount of data available to clinical researchers, physicians, and healthcare administrators continues to grow, which creates an untapped resource with the ability to improve the healthcare system drastically. Despite the enthusiasm for adopting electronic health records (EHRs), some recent studies have shown that EHR-based systems hardly improve the ability of healthcare providers to make better decisions. One reason for this inefficacy is that these systems do not allow for human-data interaction in a manner that fits and supports …


Novel Technique To Analyze The Effects Of Cognitive And Non-Cognitive Predictors On Students Course Withdrawal In College, Mohammed Ali Jul 2020

Novel Technique To Analyze The Effects Of Cognitive And Non-Cognitive Predictors On Students Course Withdrawal In College, Mohammed Ali

Technology Faculty Publications and Presentations

A novel technique was applied to a college student database to identify the cognitive and non-cognitive factors that predict college students’ course withdrawal behaviors. Predictors such as high school grade point average (HSGPA), standardized test scores (ACT–American College Test or SAT-Scholastic Aptitude Test), number of credit hours enrolled, and age were analyzed in this study. Data mining software algorithms were used to study information about undergraduate students at a west-south-central state university in the United States. The study results revealed that two factors, number of enrolled credit hours, and a student’s age have the most effect on collegiate course withdrawal …


Big Data, Spatial Optimization, And Planning, Kai Cao, Wenwen Li, Richard Church Jul 2020

Big Data, Spatial Optimization, And Planning, Kai Cao, Wenwen Li, Richard Church

Research Collection School Of Computing and Information Systems

Spatial optimization represents a set of powerful spatial analysis techniques that can be used to identify optimal solution(s) and even generate a large number of competitive alternatives. The formulation of such problems involves maximizing or minimizing one or more objectives while satisfying a number of constraints. Solution techniques range from exact models solved with such approaches as linear programming and integer programming, or heuristic algorithms, i.e. Tabu Search, Simulated Annealing, and Genetic Algorithms. Spatial optimization techniques have been utilized in numerous planning applications, such as location-allocation modeling/site selection, land use planning, school districting, regionalization, routing, and urban design. These methods …


Mining User-Generated Content Of Mobile Patient Portal: Dimensions Of User Experience, Mohammad Al-Ramahi, Cherie Noteboom Jun 2020

Mining User-Generated Content Of Mobile Patient Portal: Dimensions Of User Experience, Mohammad Al-Ramahi, Cherie Noteboom

Research & Publications

Patient portals are positioned as a central component of patient engagement through the potential to change the physician-patient relationship and enable chronic disease self-management. The incorporation of patient portals provides the promise to deliver excellent quality, at optimized costs, while improving the health of the population. This study extends the existing literature by extracting dimensions related to the Mobile Patient Portal Use. We use a topic modeling approach to systematically analyze users’ feedback from the actual use of a common mobile patient portal, Epic’s MyChart. Comparing results of Latent Dirichlet Allocation analysis with those of human analysis validated the extracted …


Cornac: A Comparative Framework For Multimodal Recommender Systems, Aghiles Salah, Quoc Tuan Truong, Hady W. Lauw May 2020

Cornac: A Comparative Framework For Multimodal Recommender Systems, Aghiles Salah, Quoc Tuan Truong, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Cornac is an open-source Python framework for multimodal recommender systems. In addition to core utilities for accessing, building, evaluating, and comparing recommender models, Cornac is distinctive in putting emphasis on recommendation models that leverage auxiliary information in the form of a social network, item textual descriptions, product images, etc. Such multimodal auxiliary data supplement user-item interactions (e.g., ratings, clicks), which tend to be sparse in practice. To facilitate broad adoption and community contribution, Cornac is publicly available at https://github.com/PreferredAI/cornac, and it can be installed via Anaconda or the Python Package Index (pip). Not only is it well-covered by unit tests …


Storage Management Strategy In Mobile Phones For Photo Crowdsensing, En Wang, Zhengdao Qu, Xinyao Liang, Xiangyu Meng, Yongjian Yang, Dawei Li, Weibin Meng Apr 2020

Storage Management Strategy In Mobile Phones For Photo Crowdsensing, En Wang, Zhengdao Qu, Xinyao Liang, Xiangyu Meng, Yongjian Yang, Dawei Li, Weibin Meng

Department of Computer Science Faculty Scholarship and Creative Works

In mobile crowdsensing, some users jointly finish a sensing task through the sensors equipped in their intelligent terminals. In particular, the photo crowdsensing based on Mobile Edge Computing (MEC) collects pictures for some specific targets or events and uploads them to nearby edge servers, which leads to richer data content and more efficient data storage compared with the common mobile crowdsensing; hence, it has attracted an important amount of attention recently. However, the mobile users prefer uploading the photos through Wifi APs (PoIs) rather than cellular networks. Therefore, photos stored in mobile phones are exchanged among users, in order to …


Feature Extraction And Analysis Of Binaries For Classification, Micah Flack Apr 2020

Feature Extraction And Analysis Of Binaries For Classification, Micah Flack

Annual Research Symposium

The research project, Feature Extraction and, Analysis of Binaries for Classification, provides an in-depth examination of the features shared by unlabeled binary samples, for classification into the categories of benign or malicious software using several different methods. Because of the time it takes to manually analyze or reverse engineer binaries to determine their function, the ability to gather features and then instantly classify samples without explicitly programming the solution is incredibly valuable. It is possible to use an online service; however, this is not always viable depending on the sensitivity of the binary. With Python3 and the Pefile library, we …


Predictive Task Assignment In Spatial Crowdsourcing: A Data-Driven Approach, Yan Zhao, Kai Zheng, Yue Cui, Han Su, Feida Zhu, Xiaofang Zhou Apr 2020

Predictive Task Assignment In Spatial Crowdsourcing: A Data-Driven Approach, Yan Zhao, Kai Zheng, Yue Cui, Han Su, Feida Zhu, Xiaofang Zhou

Research Collection School Of Computing and Information Systems

With the rapid development of mobile networks and the widespread usage of mobile devices, spatial crowdsourcing, which refers to assigning location-based tasks to moving workers, has drawn increasing attention. One of the major issues in spatial crowdsourcing is task assignment, which allocates tasks to appropriate workers. However, existing works generally assume the static offline scenarios, where the spatio-temporal information of all the workers and tasks is determined and known a priori. Ignorance of the dynamic spatio-temporal distributions of workers and tasks can often lead to poor assignment results. In this work we study a novel spatial crowdsourcing problem, namely Predictive …


Data Governance And The Emerging University, Michael J. Madison Jan 2020

Data Governance And The Emerging University, Michael J. Madison

Book Chapters

Knowledge and information governance questions are tractable primarily in institutional terms, rather than in terms of abstractions such as knowledge itself or individual or social interests. This chapter offers the modern research university as an example. Practices of data-intensive research by university-based researchers, sometimes reduced to the popular phrase “Big Data,” pose governance challenges for the university. The chapter situates those challenges in the traditional understanding of the university as an institution for understanding forms and flows of knowledge. At a broad level, the chapter argues that the new salience of data exposes emerging shifts in the social, cultural, and …


A Systematic Literature Survey Of Unmanned Aerial Vehicle Based Structural Health Monitoring, Sreehari Sreenath Jan 2020

A Systematic Literature Survey Of Unmanned Aerial Vehicle Based Structural Health Monitoring, Sreehari Sreenath

Theses, Dissertations and Capstones

Unmanned Aerial Vehicles (UAVs) are being employed in a multitude of civil applications owing to their ease of use, low maintenance, affordability, high-mobility, and ability to hover. UAVs are being utilized for real-time monitoring of road traffic, providing wireless coverage, remote sensing, search and rescue operations, delivery of goods, security and surveillance, precision agriculture, and civil infrastructure inspection. They are the next big revolution in technology and civil infrastructure, and it is expected to dominate more than $45 billion market value. The thesis surveys the UAV assisted Structural Health Monitoring or SHM literature over the last decade and categorize UAVs …


The Trust Principles For Digital Repositories, Dawei Lin, Jonathan Crabtree, Ingrid Dillo, Robert R. Downs, Rorie Edmunds, David Giaretta, Marisa De Giusti, Hervé L'Hours, Wim Hugo, Reyna Jenkyns, Varsha Khodiyar, Maryann E. Martone, Mustapha Mokrane, Vivek Navale, Jonathan Petters, Barbara Sierman, Dina V. Sokolova, Martina Stockhause, John Westbrook Jan 2020

The Trust Principles For Digital Repositories, Dawei Lin, Jonathan Crabtree, Ingrid Dillo, Robert R. Downs, Rorie Edmunds, David Giaretta, Marisa De Giusti, Hervé L'Hours, Wim Hugo, Reyna Jenkyns, Varsha Khodiyar, Maryann E. Martone, Mustapha Mokrane, Vivek Navale, Jonathan Petters, Barbara Sierman, Dina V. Sokolova, Martina Stockhause, John Westbrook

Copyright, Fair Use, Scholarly Communication, etc.

As information and communication technology has become pervasive in our society, we are increasingly dependent on both digital data and repositories that provide access to and enable the use of such resources. Repositories must earn the trust of the communities they intend to serve and demonstrate that they are reliable and capable of appropriately managing the data they hold.

Following a year-long public discussion and building on existing community consensus , several stakeholders, representing various segments of the digital repository community, have collaboratively developed and endorsed a set of guiding principles to demonstrate digital repository trustworthiness. Transparency, Responsibility, User focus, …


Building Something With The Raspberry Pi, Richard Kordel Jan 2020

Building Something With The Raspberry Pi, Richard Kordel

Presidential Research Grants

In 2017 Ryan Korn and I submitted a grant proposal in the annual Harrisburg University President’s Grant process. Our proposal was to partner with a local high school to install a classroom of 20 Raspberry Pi’s, along with the requisite peripherals. In that classroom students would be challenged to design something that combined programming with physical computing. In our presentation to the school we suggested that this project would give students the opportunity to be “amazing.”

As part of the grant, the top three students would be given scholarships to HU and the top five finalists would all be permitted …


A Study On Real-Time Database Technology And Its Applications, Geethmi Nimantha Dissanayake Jan 2020

A Study On Real-Time Database Technology And Its Applications, Geethmi Nimantha Dissanayake

Masters Theses

No abstract provided.


Synthesizing Aspect-Driven Recommendation Explanations From Reviews, Trung-Hoang Le, Hady W. Lauw Jan 2020

Synthesizing Aspect-Driven Recommendation Explanations From Reviews, Trung-Hoang Le, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Explanations help to make sense of recommendations, increasing the likelihood of adoption. However, existing approaches to explainable recommendations tend to rely on rigid, standardized templates, customized only via fill-in-the-blank aspect sentiments. For more flexible, literate, and varied explanations covering various aspects of interest, we synthesize an explanation by selecting snippets from reviews, while optimizing for representativeness and coherence. To fit target users' aspect preferences, we contextualize the opinions based on a compatible explainable recommendation model. Experiments on datasets of several product categories showcase the efficacies of our method as compared to baselines based on templates, review summarization, selection, and text …