Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Data mining

Theses/Dissertations

Discipline
Institution
Publication Year
Publication

Articles 1 - 30 of 117

Full-Text Articles in Computer Sciences

Cannabidiol Tweet Miner: A Framework For Identifying Misinformation In Cbd Tweets., Jason Turner Aug 2023

Cannabidiol Tweet Miner: A Framework For Identifying Misinformation In Cbd Tweets., Jason Turner

Electronic Theses and Dissertations

As regulations surrounding cannabis continue to develop, the demand for cannabis-based products is on the rise. Despite not producing the psychoactive effects commonly associated with THC, products containing cannabidiol (CBD) have gained immense popularity in recent years as a potential treatment option for a range of conditions, particularly those associated with pain or sleep disorders. However, due to current federal policies, these products have yet to undergo comprehensive safety and efficacy testing. Fortunately, utilizing advanced natural language processing (NLP) techniques, data harvested from social networks have been employed to investigate various social trends within healthcare, such as disease tracking and …


Campus Safety Data Gathering, Classification, And Ranking Based On Clery-Act Reports, Walaa F. Abo Elenin Jan 2023

Campus Safety Data Gathering, Classification, And Ranking Based On Clery-Act Reports, Walaa F. Abo Elenin

Electronic Theses and Dissertations

Most existing campus safety rankings are based on criminal incident history with minimal or no consideration of campus security conditions and standard safety measures. Campus safety information published by universities/colleges is usually conceptual/qualitative and not quantitative and are based-on criminal records of these campuses. Thus, no explicit and trusted ranking method for these campuses considers the level of compliance with the standard safety measures. A quantitative safety measure is important to compare different campuses easily and to learn about specific campus safety conditions.

In this thesis, we utilize Clery-Act reports of campuses to automatically analyze their safety conditions and generate …


Generative Methods, Meta-Learning, And Meta-Heuristics For Robust Cyber Defense, Marc W. Chale Sep 2022

Generative Methods, Meta-Learning, And Meta-Heuristics For Robust Cyber Defense, Marc W. Chale

Theses and Dissertations

Cyberspace is the digital communications network that supports the internet of battlefield things (IoBT), the model by which defense-centric sensors, computers, actuators and humans are digitally connected. A secure IoBT infrastructure facilitates real time implementation of the observe, orient, decide, act (OODA) loop across distributed subsystems. Successful hacking efforts by cyber criminals and strategic adversaries suggest that cyber systems such as the IoBT are not secure. Three lines of effort demonstrate a path towards a more robust IoBT. First, a baseline data set of enterprise cyber network traffic was collected and modelled with generative methods allowing the generation of realistic, …


Design And Analysis Of Strategic Behavior In Networks, Sixie Yu Aug 2022

Design And Analysis Of Strategic Behavior In Networks, Sixie Yu

McKelvey School of Engineering Theses & Dissertations

Networks permeate every aspect of our social and professional life.A networked system with strategic individuals can represent a variety of real-world scenarios with socioeconomic origins. In such a system, the individuals' utilities are interdependent---one individual's decision influences the decisions of others and vice versa. In order to gain insights into the system, the highly complicated interactions necessitate some level of abstraction. To capture the otherwise complex interactions, I use a game theoretic model called Networked Public Goods (NPG) game. I develop a computational framework based on NPGs to understand strategic individuals' behavior in networked systems. The framework consists of three …


Solving The Challenges Of Concept Drift In Data Stream Classification., Hanqing Hu Aug 2022

Solving The Challenges Of Concept Drift In Data Stream Classification., Hanqing Hu

Electronic Theses and Dissertations

The rise of network connected devices and applications leads to a significant increase in the volume of data that are continuously generated overtime time, called data streams. In real world applications, storing the entirety of a data stream for analyzing later is often not practical, due to the data stream’s potentially infinite volume. Data stream mining techniques and frameworks are therefore created to analyze streaming data as they arrive. However, compared to traditional data mining techniques, challenges unique to data stream mining also emerge, due to the high arrival rate of data streams and their dynamic nature. In this dissertation, …


Data-Driven Operational And Safety Analysis Of Emerging Shared Electric Scooter Systems, Qingyu Ma Dec 2021

Data-Driven Operational And Safety Analysis Of Emerging Shared Electric Scooter Systems, Qingyu Ma

Computational Modeling & Simulation Engineering Theses & Dissertations

The rapid rise of shared electric scooter (E-Scooter) systems offers many urban areas a new micro-mobility solution. The portable and flexible characteristics have made E-Scooters a competitive mode for short-distance trips. Compared to other modes such as bikes, E-Scooters allow riders to freely ride on different facilities such as streets, sidewalks, and bike lanes. However, sharing lanes with vehicles and other users tends to cause safety issues for riding E-Scooters. Conventional methods are often not applicable for analyzing such safety issues because well-archived historical crash records are not commonly available for emerging E-Scooters.

Perceiving the growth of such a micro-mobility …


Modeling Of Argon Bombardment And Densification Of Low Temperature Organic Precursors Using Reactive Md Simulations And Machine Learning, Kwabena Asante-Boahen Aug 2021

Modeling Of Argon Bombardment And Densification Of Low Temperature Organic Precursors Using Reactive Md Simulations And Machine Learning, Kwabena Asante-Boahen

MSU Graduate Theses

In this study, an important aspect of the synthesis process for a-BxC:Hy was systematically modeled by utilizing the Reactive Molecular Dynamics (MD) in modeling the argon bombardment from the orthocarborane molecules as the precursor. The MD simulations are used to assess the dynamics associated with the free radicals that result from the ion bombardment. By applying the Data Mining/Machine Learning analysis into the datasets generated from the large reactive MD simulations, I was able to identify and quality the kinetics of these radicals. Overall, this approach allows for a better understanding of the overall mechanism at the atomistic level of …


Exploring The Use Of Social Media To Infer Relationships Between Demographics, Psychographics And Vaccine Hesitancy, Abhimanyu Kapur Jun 2021

Exploring The Use Of Social Media To Infer Relationships Between Demographics, Psychographics And Vaccine Hesitancy, Abhimanyu Kapur

Computer Science Senior Theses

The growing popularity of social media as a platform to obtain information and share one's opinions on various topics makes it a rich source of information for research. In this study, we aimed to develop a framework to infer relationships between demographic and psychographic characteristics of a user and their opinion on a specific narrative - in this case, their stance on taking the COVID-19 vaccine. Twitter was the chosen platform due to the large USA user base and easily available data. Demographic traits included Race, Age, Gender, and Human-vs-Organization Status. Psychographic traits included the Big Five personality traits (Conscientiousness, …


Mining Subgroups From Temporal Data : From The Parts To The Whole, Alexander Gorovits May 2021

Mining Subgroups From Temporal Data : From The Parts To The Whole, Alexander Gorovits

Legacy Theses & Dissertations (2009 - 2024)

A variety of dynamic systems can be broken down into potentially overlapping subcomponents with varying temporal behavior, ranging from communities in networks, to clusters of trajectories in spatiotemporal data, to co-evolving subsets within multivariate time series. Using explicit regularization on various temporal behaviors within a tensor factorizationframework, I demonstrate means to mine these subgroups along with their temporal activities, as well as how that yields information about the overall systems. Additionally, I adapt this notion of temporal communities to the spatiotemporal setting to develop a reinforcement learning approach for optimizing co-ordinated communication between independent agents.


Binary Black Widow Optimization Algorithm For Feature Selection Problems, Ahmed Al-Saedi Jan 2021

Binary Black Widow Optimization Algorithm For Feature Selection Problems, Ahmed Al-Saedi

Theses and Dissertations (Comprehensive)

This thesis addresses feature selection (FS) problems, which is a primary stage in data mining. FS is a significant pre-processing stage to enhance the performance of the process with regards to computation cost and accuracy to offer a better comprehension of stored data by removing the unnecessary and irrelevant features from the basic dataset. However, because of the size of the problem, FS is known to be very challenging and has been classified as an NP-hard problem. Traditional methods can only be used to solve small problems. Therefore, metaheuristic algorithms (MAs) are becoming powerful methods for addressing the FS problems. …


Hierarchical Aggregation Of Multidimensional Data For Efficient Data Mining, Safaa Khalil Alwajidi Dec 2020

Hierarchical Aggregation Of Multidimensional Data For Efficient Data Mining, Safaa Khalil Alwajidi

Dissertations

Big data analysis is essential for many smart applications in areas such as connected healthcare, intelligent transportation, human activity recognition, environment, and climate change monitoring. Traditional data mining algorithms do not scale well to big data due to the enormous number of data points and the velocity of their generation. Mining and learning from big data need time and memory efficiency techniques, albeit the cost of possible loss in accuracy. This research focuses on the mining of big data using aggregated data as input. We developed a data structure that is to be used to aggregate data at multiple resolutions. …


Structured Data Mining Networks, Time Series, And Time Series Of Networks, Lin Zhang Dec 2020

Structured Data Mining Networks, Time Series, And Time Series Of Networks, Lin Zhang

Legacy Theses & Dissertations (2009 - 2024)

The rate at which data is generated in modern applications has created an unprecedented demand for novel methods to effectively and efficiently extract insightful patterns. Methods aware of known domain-specific structure in the data tend to be advantageous. In particular, a joint temporal and networked view of observations offers a holistic lens to many real-world systems. Example domains abound: activity of social network users, gene interactions over time, a temporal load of infrastructure networks, and others. Existing analysis and mining approaches for such data exhibit limited quality and scalability due to their sensitivity to noise, missing observations, and the need …


Hybrid Deep Neural Networks For Mining Heterogeneous Data, Xiurui Hou Aug 2020

Hybrid Deep Neural Networks For Mining Heterogeneous Data, Xiurui Hou

Dissertations

In the era of big data, the rapidly growing flood of data represents an immense opportunity. New computational methods are desired to fully leverage the potential that exists within massive structured and unstructured data. However, decision-makers are often confronted with multiple diverse heterogeneous data sources. The heterogeneity includes different data types, different granularities, and different dimensions, posing a fundamental challenge in many applications. This dissertation focuses on designing hybrid deep neural networks for modeling various kinds of data heterogeneity.

The first part of this dissertation concerns modeling diverse data types, the first kind of data heterogeneity. Specifically, image data and …


Mobile Location Data Analytics, Privacy, And Security, Yunhe Feng Aug 2020

Mobile Location Data Analytics, Privacy, And Security, Yunhe Feng

Doctoral Dissertations

Mobile location data are ubiquitous in the digital world. People intentionally and unintentionally generate numerous location data when connecting to cellular networks or sharing posts on social networks. As mobile devices normally choose to communicate with nearby cell towers outdoor, it is reasonable to infer human locations based on cell tower coordinates. Many social networking platforms, such as Twitter, allow users to geo-tag their posts optionally, publishing personal locations to friends or everyone. These location data are particularly useful for understanding mobile usage behaviors and human mobility patterns. Meanwhile, the public expresses great concern about the privacy and security of …


Detecting Undisclosed Paid Editing In Wikipedia, Nikesh Joshi Aug 2020

Detecting Undisclosed Paid Editing In Wikipedia, Nikesh Joshi

Boise State University Theses and Dissertations

Wikipedia is a free and open-collaboration based online encyclopedia. The website has millions of pages that are maintained by thousands of volunteer editors. It is part of Wikipedia’s fundamental principles that pages are written with a neutral point of view and are maintained by volunteer editors for free with well-defined guidelines in order to avoid or disclose any conflict of interest. However, there have been several known incidents where editors intentionally violate such guidelines in order to get paid (or even extort money) for maintaining promotional spam articles without disclosing such information.

This thesis addresses for the first time the …


Detecting Credit Card Fraud: An Analysis Of Fraud Detection Techniques, William Lovo May 2020

Detecting Credit Card Fraud: An Analysis Of Fraud Detection Techniques, William Lovo

Senior Honors Projects, 2020-current

Advancements in the modern age have brought many conveniences, one of those being credit cards. Providing an individual the ability to hold their entire purchasing power in the form of pocket-sized plastic cards have made credit cards the preferred method to complete financial transactions. However, these systems are not infallible and may provide criminals and other bad actors the opportunity to abuse them. Financial institutions and their customers lose billions of dollars every year to credit card fraud. To combat this issue, fraud detection systems are deployed to discover fraudulent activity after they have occurred. Such systems rely on advanced …


Asking Questions Is Easy, Asking Great Questions Is Hard: Constructing Effective Stack Overflow Questions, Jane W. Hsieh Jan 2020

Asking Questions Is Easy, Asking Great Questions Is Hard: Constructing Effective Stack Overflow Questions, Jane W. Hsieh

Honors Papers

This paper explores and seeks to improve the ways in which Stack Overflow question posts can elicit answers. Using statistical data analysis approaches and reviews of existing literature, we pin- point three key factors that are found in many previously success- ful/answerable questions. We then present a prototypical sidebar for the ask page that leverages these factors to dynamically (1) evaluate the quality of questions in construction (2) display answer previews of relevant questions and (3) scaffold the identified factors to subsequent askers during their question development processes.


Searching For Needles In The Cosmic Haystack, Thomas Ryan Devine Jan 2020

Searching For Needles In The Cosmic Haystack, Thomas Ryan Devine

Graduate Theses, Dissertations, and Problem Reports

Searching for pulsar signals in radio astronomy data sets is a difficult task. The data sets are extremely large, approaching the petabyte scale, and are growing larger as instruments become more advanced. Big Data brings with it big challenges. Processing the data to identify candidate pulsar signals is computationally expensive and must utilize parallelism to be scalable. Labeling benchmarks for supervised classification is costly. To compound the problem, pulsar signals are very rare, e.g., only 0.05% of the instances in one data set represent pulsars. Furthermore, there are many different approaches to candidate classification with no consensus on a best …


Bullynet: Unmasking Cyberbullies On Social Networks, Aparna Sankaran Dec 2019

Bullynet: Unmasking Cyberbullies On Social Networks, Aparna Sankaran

Boise State University Theses and Dissertations

Social media has changed the way people communicate with each other, and consecutively affected people's ability to empathize in both positive and negative ways. One of the most harmful consequences of social media is the rise of cyberbullying, which tends to be more sinister than traditional bullying given that online records typically live on the internet for quite a long time and are hard to control. In this thesis, we present a three-phase algorithm, called BullyNet, for detecting cyberbullies on Twitter social network. We exploit bullying tendencies by proposing a robust method for constructing a cyberbullying signed network. BullyNet analyzes …


Adaptive Feature Engineering Modeling For Ultrasound Image Classification For Decision Support, Hatwib Mugasa Oct 2019

Adaptive Feature Engineering Modeling For Ultrasound Image Classification For Decision Support, Hatwib Mugasa

Doctoral Dissertations

Ultrasonography is considered a relatively safe option for the diagnosis of benign and malignant cancer lesions due to the low-energy sound waves used. However, the visual interpretation of the ultrasound images is time-consuming and usually has high false alerts due to speckle noise. Improved methods of collection image-based data have been proposed to reduce noise in the images; however, this has proved not to solve the problem due to the complex nature of images and the exponential growth of biomedical datasets. Secondly, the target class in real-world biomedical datasets, that is the focus of interest of a biopsy, is usually …


Feature Space Modeling For Accurate And Efficient Learning From Non-Stationary Data, Ayesha Akter Oct 2019

Feature Space Modeling For Accurate And Efficient Learning From Non-Stationary Data, Ayesha Akter

Doctoral Dissertations

A non-stationary dataset is one whose statistical properties such as the mean, variance, correlation, probability distribution, etc. change over a specific interval of time. On the contrary, a stationary dataset is one whose statistical properties remain constant over time. Apart from the volatile statistical properties, non-stationary data poses other challenges such as time and memory management due to the limitation of computational resources mostly caused by the recent advancements in data collection technologies which generate a variety of data at an alarming pace and volume. Additionally, when the collected data is complex, managing data complexity, emerging from its dimensionality and …


Applied Deep Learning In Intelligent Transportation Systems And Embedding Exploration, Xiaoyuan Liang Aug 2019

Applied Deep Learning In Intelligent Transportation Systems And Embedding Exploration, Xiaoyuan Liang

Dissertations

Deep learning techniques have achieved tremendous success in many real applications in recent years and show their great potential in many areas including transportation. Even though transportation becomes increasingly indispensable in people’s daily life, its related problems, such as traffic congestion and energy waste, have not been completely solved, yet some problems have become even more critical. This dissertation focuses on solving the following fundamental problems: (1) passenger demand prediction, (2) transportation mode detection, (3) traffic light control, in the transportation field using deep learning. The dissertation also extends the application of deep learning to an embedding system for visualization …


Phenomena Of Social Dynamics In Online Games, Essa Alhazmi Jul 2019

Phenomena Of Social Dynamics In Online Games, Essa Alhazmi

USF Tampa Graduate Theses and Dissertations

Online communities exhibit dynamic social phenomena that, if understood, can both influence the design of technical platforms and inform theories about general social dynamics. With increasing popularity, online games provide a rich recording of social dynamics that can contribute to understanding human behavior. This dissertation studies two phenomena of social dynamics at large scale using data traces from online games. The first phenomenon is team formation and the second is players mobility between gaming servers.

This dissertation first presents a framework for collecting data from online gaming through crawling. It includes the data sources and the tools used for data …


Statistical Machine Learning Methods For Mining Spatial And Temporal Data, Fei Tan May 2019

Statistical Machine Learning Methods For Mining Spatial And Temporal Data, Fei Tan

Dissertations

Spatial and temporal dependencies are ubiquitous properties of data in numerous domains. The popularity of spatial and temporal data mining has thus grown with the increasing prevalence of massive data. The presence of spatial and temporal attributes not only provides complementary useful perspectives, but also poses new challenges to the representation and integration into the learning procedure. In this dissertation, the involved spatial and temporal dependencies are explored with three genres: sample-wise, feature-wise, and target-wise. A family of novel methodologies is developed accordingly for the dependency representation in respective scenarios.

First, dependencies among discrete, continuous and repeated observations are studied …


Knowing Without Knowing: Real-Time Usage Identification Of Computer Systems, Leila Mohammed Hawana Jan 2019

Knowing Without Knowing: Real-Time Usage Identification Of Computer Systems, Leila Mohammed Hawana

Dissertations and Theses

Contemporary computers attempt to understand a user's actions and preferences in order to make decisions that better serve the user. In pursuit of this goal, computers can make observations that range from simple pattern recognition to listening in on conversations without the device being intentionally active. While these developments are incredibly useful for customization, the inherent security risks involving personal data are not always worth it. This thesis attempts to tackle one issue in this domain, computer usage identification, and presents a solution that identifies high-level usage of a system at any given moment without looking into any personal data. …


A Data Mining Framework For Improving Student Outcomes On Step 1 Of The United States Medical Licensing Examination, James Clark Jan 2019

A Data Mining Framework For Improving Student Outcomes On Step 1 Of The United States Medical Licensing Examination, James Clark

CCE Theses and Dissertations

Identifying the factors associated with medical students who fail Step 1 of the United States Medical Licensing Examination (USMLE) has been a focus of investigation for many years. Some researchers believe lower scores on the Medical Colleges Admissions Test (MCAT) are the sole factor used to identify failure. Other researchers believe lower course outcomes during the first two years of medical training are better indicators of failure. Yet, there are medical students who fail Step 1 of the USMLE who enter medical school with high MCAT scores, and conversely medical students with lower academic credentials who are expected to have …


Citationally Enhanced Semantic Literature Based Discovery, John David Fleig Jan 2019

Citationally Enhanced Semantic Literature Based Discovery, John David Fleig

CCE Theses and Dissertations

We are living within the age of information. The ever increasing flow of data and publications poses a monumental bottleneck to scientific progress as despite the amazing abilities of the human mind, it is woefully inadequate in processing such a vast quantity of multidimensional information. The small bits of flotsam and jetsam that we leverage belies the amount of useful information beneath the surface. It is imperative that automated tools exist to better search, retrieve, and summarize this content. Combinations of document indexing and search engines can quickly find you a document whose content best matches your query - if …


Learning From Heterogeneous Data, Lu Wang Jan 2019

Learning From Heterogeneous Data, Lu Wang

Wayne State University Dissertations

Data with both heterogeneity and homogeneity is now ubiquitous due to the development of multitudinous data collection techniques. To encode the data heterogeneity and homogeneity, we focus on unsupervised and supervised learning approaches. In unsupervised learning, to consider both data heterogeneity and homogeneity, we develop three clustering frameworks to maximize the heterogeneity among data sub-groups and homogeneity within each data sub-group for over-dispersed data in three different data types, i.e., alphabetic, network and mixed feature types data. In supervised learning, the traditional approaches, however, either build a global model for a whole group including all sub-groups, which fail to consider …


Efficient Algorithms For Mining Healthcare Data :, Yan Hu Jan 2019

Efficient Algorithms For Mining Healthcare Data :, Yan Hu

Legacy Theses & Dissertations (2009 - 2024)

Data-Driven Healthcare (DDH) is defined as the usage of available medical big data to provide the best and most personalized care, which is believed to be one of the most promising directions for transforming healthcare. The healthcare data includes claims and cost data, clinical data, pharmaceutical R&D data, patient behavior and sentiment data, and health data on the web. There has been a remarkable upsurge in the adoption of healthcare data over the past several years. In particular, it has been used for medical concept extraction, patient trajectory modeling, disease inference, etc.


Predictive Analysis Of Real-Time Strategy Games Using Graph Mining, Isam Abdulmunem Alobaidi Jan 2019

Predictive Analysis Of Real-Time Strategy Games Using Graph Mining, Isam Abdulmunem Alobaidi

Doctoral Dissertations

"Machine learning and computational intelligence have facilitated the development of recommendation systems for a broad range of domains. Such recommendations are based on contextual information that is explicitly provided or pervasively collected. Recommendation systems often improve decision-making or increase the efficacy of a task. Real-Time Strategy (RTS) video games are not only a popular entertainment medium, they also are an abstraction of many real-world applications where the aim is to increase your resources and decrease those of your opponent. Using predictive analytics, which examines past examples of success and failure, we can learn how to predict positive outcomes for such …