Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Data mining

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 31 - 60 of 329

Full-Text Articles in Computer Sciences

Mining Subgroups From Temporal Data : From The Parts To The Whole, Alexander Gorovits May 2021

Mining Subgroups From Temporal Data : From The Parts To The Whole, Alexander Gorovits

Legacy Theses & Dissertations (2009 - 2024)

A variety of dynamic systems can be broken down into potentially overlapping subcomponents with varying temporal behavior, ranging from communities in networks, to clusters of trajectories in spatiotemporal data, to co-evolving subsets within multivariate time series. Using explicit regularization on various temporal behaviors within a tensor factorizationframework, I demonstrate means to mine these subgroups along with their temporal activities, as well as how that yields information about the overall systems. Additionally, I adapt this notion of temporal communities to the spatiotemporal setting to develop a reinforcement learning approach for optimizing co-ordinated communication between independent agents.


Robust Inference Of Kinase Activity Using Functional Networks, Serhan Yılmaz, Marzieh Ayati, Daniela Schlatzer, A. Ercüment Çiçek, Mark A. Chance, Mehmet Koyutürk Feb 2021

Robust Inference Of Kinase Activity Using Functional Networks, Serhan Yılmaz, Marzieh Ayati, Daniela Schlatzer, A. Ercüment Çiçek, Mark A. Chance, Mehmet Koyutürk

Computer Science Faculty Publications and Presentations

Mass spectrometry enables high-throughput screening of phosphoproteins across a broad range of biological contexts. When complemented by computational algorithms, phospho-proteomic data allows the inference of kinase activity, facilitating the identification of dysregulated kinases in various diseases including cancer, Alzheimer’s disease and Parkinson’s disease. To enhance the reliability of kinase activity inference, we present a network-based framework, RoKAI, that integrates various sources of functional information to capture coordinated changes in signaling. Through computational experiments, we show that phosphorylation of sites in the functional neighborhood of a kinase are significantly predictive of its activity. The incorporation of this knowledge in RoKAI consistently …


Occam Manual, Martin Zwick Jan 2021

Occam Manual, Martin Zwick

Systems Science Faculty Publications and Presentations

Occam is a Discrete Multivariate Modeling (DMM) tool based on the methodology of Reconstructability Analysis (RA). Its typical usage is for analysis of problems involving large numbers of discrete variables. Models are developed which consist of one or more components, which are then evaluated for their fit and statistical significance. Occam can search the lattice of all possible models, or can do detailed analysis on a specific model.

In Variable-Based Modeling (VBM), model components are collections of variables. In State-Based Modeling (SBM), components identify one or more specific states or substates.

Occam provides a web-based interface, which …


Binary Black Widow Optimization Algorithm For Feature Selection Problems, Ahmed Al-Saedi Jan 2021

Binary Black Widow Optimization Algorithm For Feature Selection Problems, Ahmed Al-Saedi

Theses and Dissertations (Comprehensive)

This thesis addresses feature selection (FS) problems, which is a primary stage in data mining. FS is a significant pre-processing stage to enhance the performance of the process with regards to computation cost and accuracy to offer a better comprehension of stored data by removing the unnecessary and irrelevant features from the basic dataset. However, because of the size of the problem, FS is known to be very challenging and has been classified as an NP-hard problem. Traditional methods can only be used to solve small problems. Therefore, metaheuristic algorithms (MAs) are becoming powerful methods for addressing the FS problems. …


A Case Study On Player Selection And Team Formation In Football With Machinelearning, Di̇dem Abi̇di̇n Jan 2021

A Case Study On Player Selection And Team Formation In Football With Machinelearning, Di̇dem Abi̇di̇n

Turkish Journal of Electrical Engineering and Computer Sciences

Machine learning has been widely used in different domains to extract information from raw data. Sports is one of the popular domains for researchers to work on recently. Although score prediction for matches is the most preferred application area for artificial intelligence, player selection, and team formation is also an application area worth working on. There are some studies in the literature about player selection and team formation which are examined in this study. The study has two important contributions: First one is to apply seven different machine learning algorithms on our dataset to find the best player combination for …


Toward Tweet-Mining Framework For Extracting Terrorist Attack-Related Information And Reporting, Farkhund Iqbal, Rabia Batool, Benjamin C. M. Fung, Saiqa Aleem, Ahmed Abbasi, Abdul Rehman Javed Jan 2021

Toward Tweet-Mining Framework For Extracting Terrorist Attack-Related Information And Reporting, Farkhund Iqbal, Rabia Batool, Benjamin C. M. Fung, Saiqa Aleem, Ahmed Abbasi, Abdul Rehman Javed

All Works

The widespread popularity of social networking is leading to the adoption of Twitter as an information dissemination tool. Existing research has shown that information dissemination over Twitter has a much broader reach than traditional media and can be used for effective post-incident measures. People use informal language on Twitter, including acronyms, misspelled words, synonyms, transliteration, and ambiguous terms. This makes incident-related information extraction a non-trivial task. However, this information can be valuable for public safety organizations that need to respond in an emergency. This paper proposes an early event-related information extraction and reporting framework that monitors Twitter streams synthesizes event-specific …


Hierarchical Aggregation Of Multidimensional Data For Efficient Data Mining, Safaa Khalil Alwajidi Dec 2020

Hierarchical Aggregation Of Multidimensional Data For Efficient Data Mining, Safaa Khalil Alwajidi

Dissertations

Big data analysis is essential for many smart applications in areas such as connected healthcare, intelligent transportation, human activity recognition, environment, and climate change monitoring. Traditional data mining algorithms do not scale well to big data due to the enormous number of data points and the velocity of their generation. Mining and learning from big data need time and memory efficiency techniques, albeit the cost of possible loss in accuracy. This research focuses on the mining of big data using aggregated data as input. We developed a data structure that is to be used to aggregate data at multiple resolutions. …


Structured Data Mining Networks, Time Series, And Time Series Of Networks, Lin Zhang Dec 2020

Structured Data Mining Networks, Time Series, And Time Series Of Networks, Lin Zhang

Legacy Theses & Dissertations (2009 - 2024)

The rate at which data is generated in modern applications has created an unprecedented demand for novel methods to effectively and efficiently extract insightful patterns. Methods aware of known domain-specific structure in the data tend to be advantageous. In particular, a joint temporal and networked view of observations offers a holistic lens to many real-world systems. Example domains abound: activity of social network users, gene interactions over time, a temporal load of infrastructure networks, and others. Existing analysis and mining approaches for such data exhibit limited quality and scalability due to their sensitivity to noise, missing observations, and the need …


Hybrid Deep Neural Networks For Mining Heterogeneous Data, Xiurui Hou Aug 2020

Hybrid Deep Neural Networks For Mining Heterogeneous Data, Xiurui Hou

Dissertations

In the era of big data, the rapidly growing flood of data represents an immense opportunity. New computational methods are desired to fully leverage the potential that exists within massive structured and unstructured data. However, decision-makers are often confronted with multiple diverse heterogeneous data sources. The heterogeneity includes different data types, different granularities, and different dimensions, posing a fundamental challenge in many applications. This dissertation focuses on designing hybrid deep neural networks for modeling various kinds of data heterogeneity.

The first part of this dissertation concerns modeling diverse data types, the first kind of data heterogeneity. Specifically, image data and …


Mobile Location Data Analytics, Privacy, And Security, Yunhe Feng Aug 2020

Mobile Location Data Analytics, Privacy, And Security, Yunhe Feng

Doctoral Dissertations

Mobile location data are ubiquitous in the digital world. People intentionally and unintentionally generate numerous location data when connecting to cellular networks or sharing posts on social networks. As mobile devices normally choose to communicate with nearby cell towers outdoor, it is reasonable to infer human locations based on cell tower coordinates. Many social networking platforms, such as Twitter, allow users to geo-tag their posts optionally, publishing personal locations to friends or everyone. These location data are particularly useful for understanding mobile usage behaviors and human mobility patterns. Meanwhile, the public expresses great concern about the privacy and security of …


Detecting Undisclosed Paid Editing In Wikipedia, Nikesh Joshi Aug 2020

Detecting Undisclosed Paid Editing In Wikipedia, Nikesh Joshi

Boise State University Theses and Dissertations

Wikipedia is a free and open-collaboration based online encyclopedia. The website has millions of pages that are maintained by thousands of volunteer editors. It is part of Wikipedia’s fundamental principles that pages are written with a neutral point of view and are maintained by volunteer editors for free with well-defined guidelines in order to avoid or disclose any conflict of interest. However, there have been several known incidents where editors intentionally violate such guidelines in order to get paid (or even extort money) for maintaining promotional spam articles without disclosing such information.

This thesis addresses for the first time the …


Prevalence, Contents And Automatic Detection Of Kl-Satd, Leevi Rantala, Mika Mantyla, David Lo Aug 2020

Prevalence, Contents And Automatic Detection Of Kl-Satd, Leevi Rantala, Mika Mantyla, David Lo

Research Collection School Of Computing and Information Systems

When developers use different keywords such as TODO and FIXME in source code comments to describe self-admitted technical debt (SATD), we refer it as Keyword-Labeled SATD (KL-SATD). We study KL-SATD from 33 software repositories with 13,588 KL-SATD comments. We find that the median percentage of KL-SATD comments among all comments is only 1,52%. We find that KL-SATD comment contents include words expressing code changes and uncertainty, such as remove, fix, maybe and probably. This makes them different compared to other comments. KL-SATD comment contents are similar to manually labeled SATD comments of prior work. Our machine learning classifier using logistic …


Joint Lattice Of Reconstructability Analysis And Bayesian Network General Graphs, Marcus Harris, Martin Zwick Jul 2020

Joint Lattice Of Reconstructability Analysis And Bayesian Network General Graphs, Marcus Harris, Martin Zwick

Systems Science Faculty Publications and Presentations

This paper integrates the structures considered in Reconstructability Analysis (RA) and those considered in Bayesian Networks (BN) into a joint lattice of probabilistic graphical models. This integration and associated lattice visualizations are done in this paper for four variables, but the approach can easily be expanded to more variables. The work builds on the RA work of Klir (1985), Krippendorff (1986), and Zwick (2001), and the BN work of Pearl (1985, 1987, 1988, 2000), Verma (1990), Heckerman (1994), Chickering (1995), Andersson (1997), and others. The RA four variable lattice and the BN four variable lattice partially overlap: there are ten …


Reconstructability Analysis & Its Occam Implementation, Martin Zwick Jul 2020

Reconstructability Analysis & Its Occam Implementation, Martin Zwick

Systems Science Faculty Publications and Presentations

This talk will describe Reconstructability Analysis (RA), a probabilistic graphical modeling methodology deriving from the 1960s work of Ross Ashby and developed in the systems community in the 1980s and afterwards. RA, based on information theory and graph theory, resembles and partially overlaps Bayesian networks (BN) and log-linear techniques, but also has some unique capabilities. (A paper explaining the relationship between RA and BN will be given in this special session.) RA is designed for exploratory modeling although it can also be used for confirmatory hypothesis testing. In RA modeling, one either predicts some DV from a set of IVs …


Probabilistic Value Selection For Space Efficient Model, Gunarto Sindoro Njoo, Baihua Zheng, Kuo-Wei Hsu, Wen-Chih Peng Jul 2020

Probabilistic Value Selection For Space Efficient Model, Gunarto Sindoro Njoo, Baihua Zheng, Kuo-Wei Hsu, Wen-Chih Peng

Research Collection School Of Computing and Information Systems

An alternative to current mainstream preprocessing methods is proposed: Value Selection (VS). Unlike the existing methods such as feature selection that removes features and instance selection that eliminates instances, value selection eliminates the values (with respect to each feature) in the dataset with two purposes: reducing the model size and preserving its accuracy. Two probabilistic methods based on information theory's metric are proposed: PVS and P + VS. Extensive experiments on the benchmark datasets with various sizes are elaborated. Those results are compared with the existing preprocessing methods such as feature selection, feature transformation, and instance selection methods. Experiment results …


Detecting Credit Card Fraud: An Analysis Of Fraud Detection Techniques, William Lovo May 2020

Detecting Credit Card Fraud: An Analysis Of Fraud Detection Techniques, William Lovo

Senior Honors Projects, 2020-current

Advancements in the modern age have brought many conveniences, one of those being credit cards. Providing an individual the ability to hold their entire purchasing power in the form of pocket-sized plastic cards have made credit cards the preferred method to complete financial transactions. However, these systems are not infallible and may provide criminals and other bad actors the opportunity to abuse them. Financial institutions and their customers lose billions of dollars every year to credit card fraud. To combat this issue, fraud detection systems are deployed to discover fraudulent activity after they have occurred. Such systems rely on advanced …


A Studying Of Webcontent Mining Tools, Rasha Hani Salman, Mahmood Zaki, Nadia A. Shiltag Apr 2020

A Studying Of Webcontent Mining Tools, Rasha Hani Salman, Mahmood Zaki, Nadia A. Shiltag

Al-Qadisiyah Journal of Pure Science

The web today has become an archive of information in any structure such content, sound, video, designs, and multimedia, with the progression of time overall web, the world wide web is now crowded with different data making extraction of virtual data burdensome process, web utilizes various information mining strategies to mine helpful information from page substance and web hyperlink. The fundamental employments of web content mining are to gather, sort out, classify, providing the best data accessible on the web for the client who needs to get it. The WCM tools are needful to examining some HTML reports, content and …


Developing Big Data Projects In Open University Engineering Courses: Lessons Learned, Juan A. Lara, Aurea Anguera De Sojo, Shadi Aljawarneh, Robert P. Schumaker, Bassam Al-Shargabi Feb 2020

Developing Big Data Projects In Open University Engineering Courses: Lessons Learned, Juan A. Lara, Aurea Anguera De Sojo, Shadi Aljawarneh, Robert P. Schumaker, Bassam Al-Shargabi

Computer Science Faculty Publications and Presentations

Big Data courses in which students are asked to carry out Big Data projects are becoming more frequent as a part of University Engineering curriculum. In these courses, instructors and students must face a series of special characteristics, difficulties and challenges that it is important to know about beforehand, so the lecturer can better plan the subject and manage the teaching methods in order to prevent students' academic dropout and low performance. The goal of this research is to approach this problem by sharing the lessons learned in the process of teaching e-learning courses where students are required to develop …


A Framework For Online Social Network Volatile Data Analysis: A Case For The Fast Fashion Industry, Anoud Bani-Hani, Feras Al-Obeidat, Elhadj Benkhelifa, Oluwasegun Adedugbe Jan 2020

A Framework For Online Social Network Volatile Data Analysis: A Case For The Fast Fashion Industry, Anoud Bani-Hani, Feras Al-Obeidat, Elhadj Benkhelifa, Oluwasegun Adedugbe

All Works

Consumer satisfaction is an important part for any business as it has been shown to be a major factor for consumer loyalty. Identifying satisfaction in products is also important as it allows businesses alter production plans based on the level of consumer satisfaction for a product. With consumer satisfaction data being very volatile for some products due to a short requirement period for such products, current consumer satisfaction must be identified within a shorter period before the data becomes obsolete. The fast fashion industry, which is part of the fashion industry, is adopted as a case study in this research. …


Analysis And Optimization Of Combustion Characteristics Of Cement Kiln Cooperatively Disposing Domestic Refuse, Jingbing Wu, Hanqing Tang, Xu Jun Jan 2020

Analysis And Optimization Of Combustion Characteristics Of Cement Kiln Cooperatively Disposing Domestic Refuse, Jingbing Wu, Hanqing Tang, Xu Jun

Journal of System Simulation

Abstract: Because the traditional methods can hardly analyze the complex combustion characteristics of cement kiln mixed with domestic refuse, a data mining technology is introduced. A domestic cement plant is selected as the object, and its operating data and relevant parameters are collected. The influence coefficient of each parameter on coal consumption and NOx emission is analyzed by using Stability Selection algorithm. The mathematical model of coal consumption and NOx emission is established with Random Forest algorithm, and the key optimization parameters and their optimal values are obtained by K-means clustering algorithm. The result shows that this method …


Asking Questions Is Easy, Asking Great Questions Is Hard: Constructing Effective Stack Overflow Questions, Jane W. Hsieh Jan 2020

Asking Questions Is Easy, Asking Great Questions Is Hard: Constructing Effective Stack Overflow Questions, Jane W. Hsieh

Honors Papers

This paper explores and seeks to improve the ways in which Stack Overflow question posts can elicit answers. Using statistical data analysis approaches and reviews of existing literature, we pin- point three key factors that are found in many previously success- ful/answerable questions. We then present a prototypical sidebar for the ask page that leverages these factors to dynamically (1) evaluate the quality of questions in construction (2) display answer previews of relevant questions and (3) scaffold the identified factors to subsequent askers during their question development processes.


Multitask-Based Association Rule Mining, Peli̇n Yildirim Taşer, Kökten Ulaş Bi̇rant, Derya Bi̇rant Jan 2020

Multitask-Based Association Rule Mining, Peli̇n Yildirim Taşer, Kökten Ulaş Bi̇rant, Derya Bi̇rant

Turkish Journal of Electrical Engineering and Computer Sciences

Recently, there has been a growing interest in association rule mining (ARM) in various fields. However, standard ARM algorithms fail to discover rules for multitask problems as they do not consider task-oriented investigation and, therefore, they ignore the correlation among the tasks. Considering this situation, this paper proposes a novel algorithm, named multitask association rule miner (MTARM), that tends to jointly discover rules by considering multiple tasks. This paper also introduces two novel concepts: single-task rule and multiple-task rule. In the first phase of the proposed approach, highly frequent local rules (single-task rules) are explored for each task separately and …


Searching For Needles In The Cosmic Haystack, Thomas Ryan Devine Jan 2020

Searching For Needles In The Cosmic Haystack, Thomas Ryan Devine

Graduate Theses, Dissertations, and Problem Reports

Searching for pulsar signals in radio astronomy data sets is a difficult task. The data sets are extremely large, approaching the petabyte scale, and are growing larger as instruments become more advanced. Big Data brings with it big challenges. Processing the data to identify candidate pulsar signals is computationally expensive and must utilize parallelism to be scalable. Labeling benchmarks for supervised classification is costly. To compound the problem, pulsar signals are very rare, e.g., only 0.05% of the instances in one data set represent pulsars. Furthermore, there are many different approaches to candidate classification with no consensus on a best …


Energy Efficiency Data Mining And Scheduling Optimization Of Discrete Workshop, Yugu Lin, Wang Yan Dec 2019

Energy Efficiency Data Mining And Scheduling Optimization Of Discrete Workshop, Yugu Lin, Wang Yan

Journal of System Simulation

Abstract: This paper addresses the optimization of energy consumption in discrete workshops and establishes the energy efficiency optimization model of discrete workshops. The relationship between data mining and knowledge discovery is established. Through scheduling data preprocessing and C4.5 decision tree learning algorithm, the discovery of scheduling knowledge is realized. Energy efficiency optimization calculation is achieved in discrete workshops by the combination of scheduling knowledge and improved differential evolution algorithm (IDE). By comparing with TLBO, GA and PSO, the feasibility of IDE algorithm is verified.


Bullynet: Unmasking Cyberbullies On Social Networks, Aparna Sankaran Dec 2019

Bullynet: Unmasking Cyberbullies On Social Networks, Aparna Sankaran

Boise State University Theses and Dissertations

Social media has changed the way people communicate with each other, and consecutively affected people's ability to empathize in both positive and negative ways. One of the most harmful consequences of social media is the rise of cyberbullying, which tends to be more sinister than traditional bullying given that online records typically live on the internet for quite a long time and are hard to control. In this thesis, we present a three-phase algorithm, called BullyNet, for detecting cyberbullies on Twitter social network. We exploit bullying tendencies by proposing a robust method for constructing a cyberbullying signed network. BullyNet analyzes …


Adaptive Feature Engineering Modeling For Ultrasound Image Classification For Decision Support, Hatwib Mugasa Oct 2019

Adaptive Feature Engineering Modeling For Ultrasound Image Classification For Decision Support, Hatwib Mugasa

Doctoral Dissertations

Ultrasonography is considered a relatively safe option for the diagnosis of benign and malignant cancer lesions due to the low-energy sound waves used. However, the visual interpretation of the ultrasound images is time-consuming and usually has high false alerts due to speckle noise. Improved methods of collection image-based data have been proposed to reduce noise in the images; however, this has proved not to solve the problem due to the complex nature of images and the exponential growth of biomedical datasets. Secondly, the target class in real-world biomedical datasets, that is the focus of interest of a biopsy, is usually …


Feature Space Modeling For Accurate And Efficient Learning From Non-Stationary Data, Ayesha Akter Oct 2019

Feature Space Modeling For Accurate And Efficient Learning From Non-Stationary Data, Ayesha Akter

Doctoral Dissertations

A non-stationary dataset is one whose statistical properties such as the mean, variance, correlation, probability distribution, etc. change over a specific interval of time. On the contrary, a stationary dataset is one whose statistical properties remain constant over time. Apart from the volatile statistical properties, non-stationary data poses other challenges such as time and memory management due to the limitation of computational resources mostly caused by the recent advancements in data collection technologies which generate a variety of data at an alarming pace and volume. Additionally, when the collected data is complex, managing data complexity, emerging from its dimensionality and …


Applied Deep Learning In Intelligent Transportation Systems And Embedding Exploration, Xiaoyuan Liang Aug 2019

Applied Deep Learning In Intelligent Transportation Systems And Embedding Exploration, Xiaoyuan Liang

Dissertations

Deep learning techniques have achieved tremendous success in many real applications in recent years and show their great potential in many areas including transportation. Even though transportation becomes increasingly indispensable in people’s daily life, its related problems, such as traffic congestion and energy waste, have not been completely solved, yet some problems have become even more critical. This dissertation focuses on solving the following fundamental problems: (1) passenger demand prediction, (2) transportation mode detection, (3) traffic light control, in the transportation field using deep learning. The dissertation also extends the application of deep learning to an embedding system for visualization …


Phenomena Of Social Dynamics In Online Games, Essa Alhazmi Jul 2019

Phenomena Of Social Dynamics In Online Games, Essa Alhazmi

USF Tampa Graduate Theses and Dissertations

Online communities exhibit dynamic social phenomena that, if understood, can both influence the design of technical platforms and inform theories about general social dynamics. With increasing popularity, online games provide a rich recording of social dynamics that can contribute to understanding human behavior. This dissertation studies two phenomena of social dynamics at large scale using data traces from online games. The first phenomenon is team formation and the second is players mobility between gaming servers.

This dissertation first presents a framework for collecting data from online gaming through crawling. It includes the data sources and the tools used for data …


Discovery Of Topological Constraints On Spatial Object Classes Using A Refined Topological Model, Ivan Majic, Elham Naghizade, Stephan Winter, Martin Tomko Jun 2019

Discovery Of Topological Constraints On Spatial Object Classes Using A Refined Topological Model, Ivan Majic, Elham Naghizade, Stephan Winter, Martin Tomko

Journal of Spatial Information Science

In a typical data collection process, a surveyed spatial object is annotated upon creation, and is classified based on its attributes. This annotation can also be guided by textual definitions of objects. However, interpretations of such definitions may differ among people, and thus result in subjective and inconsistent classification of objects. This problem becomes even more pronounced if the cultural and linguistic differences are considered. As a solution, this paper investigates the role of topology as the defining characteristic of a class of spatial objects. We propose a data mining approach based on frequent itemset mining to learn patterns in …