Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Theses/Dissertations

Data mining

Discipline
Institution
Publication Year
Publication
File Type

Articles 61 - 90 of 125

Full-Text Articles in Computer Sciences

Spatio-Temporal Patterns Of Gps Trajectories Using Association Rule Mining, Vivek Kumar Sharma May 2016

Spatio-Temporal Patterns Of Gps Trajectories Using Association Rule Mining, Vivek Kumar Sharma

Computer Science and Engineering Theses

The availability of location-tracking devices such as GPS, Cellular Networks and other devices provides the facility to log a person or device locations automatically. This creates spatio-temporal datasets of user's movement with features like latitude,longitude of a particular location on a specific day and time. With the help of these features different patterns of user movement can be collected,queues and analyzed.In this research work, we are focused on user's movement patterns and frequent movements of users on a particular place,day or time interval. To achieve this we used Association Rule mining concept based on Apriori algorithm to find interesting movement …


Identifying Terrorist Affiliations Through Social Network Analysis Using Data Mining Techniques, Govand A. Ali Apr 2016

Identifying Terrorist Affiliations Through Social Network Analysis Using Data Mining Techniques, Govand A. Ali

Information Technology Master Theses

In a technologically enabled world, local ideologically inspired warfare becomes global all too quickly, specifically terrorist groups like Al Quaeda and ISIS (Daesh) have successfully used modern computing technology and social networking environments to broadcast their message, recruit new members, and plot attacks. This is especially true for such platforms as Twitter and encrypted mobile apps like Telegram or the clandestine Alrawi. As early detection of such activity is crucial to attack prevention data mining techniques have become increasingly important in the fight against the spread of global terrorist activity. This study employs data mining tools to mine Twitter for …


Unsupervised Learning Framework For Large-Scale Flight Data Analysis Of Cockpit Human Machine Interaction Issues, Abhishek B. Vaidya Apr 2016

Unsupervised Learning Framework For Large-Scale Flight Data Analysis Of Cockpit Human Machine Interaction Issues, Abhishek B. Vaidya

Open Access Theses

As the level of automation within an aircraft increases, the interactions between the pilot and autopilot play a crucial role in its proper operation. Issues with human machine interactions (HMI) have been cited as one of the main causes behind many aviation accidents. Due to the complexity of such interactions, it is challenging to identify all possible situations and develop the necessary contingencies. In this thesis, we propose a data-driven analysis tool to identify potential HMI issues in large-scale Flight Operational Quality Assurance (FOQA) dataset. The proposed tool is developed using a multi-level clustering framework, where a set of basic …


Novel Machine Learning Methods For Modeling Time-To-Event Data, Bhanukiran Vinzamuri Jan 2016

Novel Machine Learning Methods For Modeling Time-To-Event Data, Bhanukiran Vinzamuri

Wayne State University Dissertations

Predicting time-to-event from longitudinal data where different events occur at different time points is an extremely important problem in several domains such as healthcare, economics, social networks and seismology, to name a few. A unique challenge in this problem involves building predictive models from right censored data (also called as survival data). This is a phenomenon where instances whose event of interest are not yet observed within a given observation time window and are considered to be right censored. Effective models for predicting time-to-event labels from such right censored data with good accuracy can have a significant impact in these …


Speaker Identification In Live Events Using Twitter, Minumol Joseph Dec 2015

Speaker Identification In Live Events Using Twitter, Minumol Joseph

Computer Science and Engineering Theses

The prevalence of social media has given rise to a new research area. Data from social media is now being used in research to gather deeper insights into many different fields. Twitter is one of the most popular microblogging websites. Users express themselves on a variety of different topics in 140 characters or less. Oftentimes, users “tweet” about issues and subjects that are gaining in popularity, a great example being politics. Any development in politics frequently results in a tweet of some form. The research which follows focuses on identifying a speaker’s name at a live event by collecting and …


Clustering-Based Personalization, Seyed Nima Mirbakhsh Sep 2015

Clustering-Based Personalization, Seyed Nima Mirbakhsh

Electronic Thesis and Dissertation Repository

Recommendation systems have been the most emerging technology in the last decade as one of the key parts in e-commerce ecosystem. Businesses offer a wide variety of items and contents through different channels such as Internet, Smart TVs, Digital Screens, etc. The number of these items sometimes goes over millions for some businesses. Therefore, users can have trouble finding the products that they are looking for. Recommendation systems address this problem by providing powerful methods which enable users to filter through large information and product space based on their preferences. Moreover, users have different preferences. Thus, businesses can employ recommendation …


Data Mining In Computational Proteomics And Genomics, Yang Song May 2015

Data Mining In Computational Proteomics And Genomics, Yang Song

Dissertations

This dissertation addresses data mining in bioinformatics by investigating two important problems, namely peak detection and structure matching. Peak detection is useful for biological pattern discovery while structure matching finds many applications in clustering and classification.

The first part of this dissertation focuses on elastic peak detection in 2D liquid chromatographic mass spectrometry (LC-MS) data used in proteomics research. These data can be modeled as a time series, in which the X-axis represents time points and the Y-axis represents intensity values. A peak occurs in a set of 2D LC-MS data when the sum of the intensity values in a …


Novel Computational Methods For Transcript Reconstruction And Quantification Using Rna-Seq Data, Yan Huang Jan 2015

Novel Computational Methods For Transcript Reconstruction And Quantification Using Rna-Seq Data, Yan Huang

Theses and Dissertations--Computer Science

The advent of RNA-seq technologies provides an unprecedented opportunity to precisely profile the mRNA transcriptome of a specific cell population. It helps reveal the characteristics of the cell under the particular condition such as a disease. It is now possible to discover mRNA transcripts not cataloged in existing database, in addition to assessing the identities and quantities of the known transcripts in a given sample or cell. However, the sequence reads obtained from an RNA-seq experiment is only a short fragment of the original transcript. How to recapitulate the mRNA transcriptome from short RNA-seq reads remains a challenging problem. We …


Indirect Association Rule Mining For Crime Data Analysis, Riley Englin Jan 2015

Indirect Association Rule Mining For Crime Data Analysis, Riley Englin

EWU Masters Thesis Collection

"Crime data analysis is difficult to undertake. There are continuous efforts to analyze crime and determine ways to combat crime but that task is a complex one. Additionally, the nature of a domestic violence crime is hard to detect and even more difficult to predict. Recently police have taken steps to better classify domestic violence cases. The problem is that there is nominal research into this category of crime, possibly due to its sensitive nature or lack of data available for analysis, and therefore there is little known about these crimes and how they relate to others. The objectives of …


Topic Analysis And Application Using Nonnegative Matrix Factorizations (Nmf), Xin Wang Jan 2015

Topic Analysis And Application Using Nonnegative Matrix Factorizations (Nmf), Xin Wang

Legacy Theses & Dissertations (2009 - 2024)

Managing large and growing amount of information is a central goal of modern computer science. Data repositories of texts, images and videos have become widely accessible, thus necessitating good methods of retrieval, organization and exploration.


Pattern Mining And Events Discovery In Molecular Dynamics Simulations Data, Shobhit Sandesh Shakya Jan 2015

Pattern Mining And Events Discovery In Molecular Dynamics Simulations Data, Shobhit Sandesh Shakya

LSU Doctoral Dissertations

Molecular dynamics simulation method is widely used to calculate and understand a wide range of properties of materials. A lot of research efforts have been focused on simulation techniques but relatively fewer works are done on methods for analyzing the simulation results. Large-scale simulations usually generate massive amounts of data, which make manual analysis infeasible, particularly when it is necessary to look into the details of the simulation results. In this dissertation, we propose a system that uses computational method to automatically perform analysis of simulation data, which represent atomic position-time series. The system identifies, in an automated fashion, the …


Social Fingerprinting: Identifying Users Of Social Networks By Their Data Footprint, Denise Koessler Gosnell Dec 2014

Social Fingerprinting: Identifying Users Of Social Networks By Their Data Footprint, Denise Koessler Gosnell

Doctoral Dissertations

This research defines, models, and quantifies a new metric for social networks: the social fingerprint. Just as one's fingers leave behind a unique trace in a print, this dissertation introduces and demonstrates that the manner in which people interact with other accounts on social networks creates a unique data trail. Accurate identification of a user's social fingerprint can address the growing demand for improved techniques in unique user account analysis, computational forensics and social network analysis.

In this dissertation, we theorize, construct and test novel software and methodologies which quantify features of social network data. All approaches and methodologies are …


A Knowledge Discovery Approach For The Detection Of Power Grid State Variable Attacks, Nathan Wallace Jul 2014

A Knowledge Discovery Approach For The Detection Of Power Grid State Variable Attacks, Nathan Wallace

Doctoral Dissertations

As the level of sophistication in power system technologies increases, the amount of system state parameters being recorded also increases. This data not only provides an opportunity for monitoring and diagnostics of a power system, but it also creates an environment wherein security can be maintained. Being able to extract relevant information from this pool of data is one of the key challenges still yet to be obtained in the smart grid. The potential exists for the creation of innovative power grid cybersecurity applications, which harness the information gained from advanced analytics. Such analytics can be based on the extraction …


Ranking-Based Approaches For Localizing Faults, Lucia Lucia Jun 2014

Ranking-Based Approaches For Localizing Faults, Lucia Lucia

Dissertations and Theses Collection (Open Access)

A fault is the root cause of program failures where a program behaves differently from the intended behavior. Finding or localizing faults is often laborious (especially so for complex programs), yet it is an important task in the software lifecycle. An automated technique that can accurately and quickly identify the faulty code is greatly needed to alleviate the costs of software debugging. Many fault localization techniques assume that faults are localizable, i.e., each fault manifests only in a single or a few lines of code that are close to one another. To verify this assumption, we study how faults spread …


Corl8: A System For Analyzing Diagnostic Measures In Wireless Sensor Networks, Loren Klingman May 2014

Corl8: A System For Analyzing Diagnostic Measures In Wireless Sensor Networks, Loren Klingman

All Theses

Due to an increasing demand to monitor the physical world, researchers are deploying wireless sensor networks more than ever before. These networks comprise a large number of sensors integrated with small, low-power wireless transceivers used to transmit data to a central processing and storage location. These devices are often deployed in harsh, volatile locations, which increases their failure rate and decreases the rate at which packets can be successfully transmitted. Existing sensor debugging tools, such as Sympathy and EmStar, rely on add-in network protocols to report status information, and to collectively diagnose network problems. Some protocols rely on a central …


On Predicting User Affiliations Using Social Features In Online Social Networks, Minh Thap Nguyen Mar 2014

On Predicting User Affiliations Using Social Features In Online Social Networks, Minh Thap Nguyen

Dissertations and Theses Collection (Open Access)

User profiling such as user affiliation prediction in online social network is a challenging task, with many important applications in targeted marketing and personalized recommendation. The research task here is to predict some user affiliation attributes that suggest user participation in different social groups.


Graph Mining And Module Detection In Protein-Protein Interaction Networks, Ru Shen Jan 2014

Graph Mining And Module Detection In Protein-Protein Interaction Networks, Ru Shen

Legacy Theses & Dissertations (2009 - 2024)

Graphs are intuitive representations of relational data. Graphs have been widely used to represent biological molecular networks that operate in the living systems. In the study of systems biology, using graph mining techniques and graph-theory-based algorithms to


Roughened Random Forests For Binary Classification, Kuangnan Xiong Jan 2014

Roughened Random Forests For Binary Classification, Kuangnan Xiong

Legacy Theses & Dissertations (2009 - 2024)

Binary classification plays an important role in many decision-making processes. Random forests can build a strong ensemble classifier by combining weaker classification trees that are de-correlated. The strength and correlation among individual classification trees are the key factors that contribute to the ensemble performance of random forests. We propose roughened random forests, a new set of tools which show further improvement over random forests in binary classification. Roughened random forests modify the original dataset for each classification tree and further reduce the correlation among individual classification trees. This data modification process is composed of artificially imposing missing data that are …


Multi-Threaded Implementation Of Association Rule Mining With Visualization Of The Pattern Tree, Eera Gupta Jan 2014

Multi-Threaded Implementation Of Association Rule Mining With Visualization Of The Pattern Tree, Eera Gupta

LSU Master's Theses

Motor Vehicle fatalities per 100,000 population in the United States has been reported to be 10.69% in the year 2012 as per NHTSA (National Highway Traffic Safety Administration). The fatality rate has increased by 0.27% in 2012 compared to the rate in the year 2011. As per the reports, there are many factors involved in increasing the fatality rate drastically such as driving under influence, testing while driving, and various other weather phenomena. Decision makers need to analyze the factors attributing to the increase in an accident rate to take implied measures. Current methods used to perform the data analysis …


Knowledge Extraction From Survey Data Using Neural Networks, Imran Ahmed Khan Jul 2013

Knowledge Extraction From Survey Data Using Neural Networks, Imran Ahmed Khan

Computer Science Theses

Surveys are an important tool for researchers. Survey attributes are typically discrete data measured on a Likert scale. Collected responses from the survey contain an enormous amount of data. It is increasingly important to develop powerful means for clustering such data and knowledge extraction that could help in decision-making. The process of clustering becomes complex if the number of survey attributes is large. Another major issue in Likert-Scale data is the uniqueness of tuples. A large number of unique tuples may result in a large number of patterns and that may increase the complexity of the knowledge extraction process. Also, …


Localizing State-Dependent Faults Using Associated Sequence Mining, Shaimaa Ali May 2013

Localizing State-Dependent Faults Using Associated Sequence Mining, Shaimaa Ali

Electronic Thesis and Dissertation Repository

In this thesis we developed a new fault localization process to localize faults in object oriented software. The process is built upon the "Encapsulation'' principle and aims to locate state-dependent discrepancies in the software's behavior. We experimented with the proposed process on 50 seeded faults in 8 subject programs, and were able to locate the faulty class in 100% of the cases when objects with constant states were taken into consideration, while we missed 24% percent of the faults when these objects were not considered. We also developed a customized data mining technique "Associated sequence mining'' to be used in …


Document Collection Visualization And Clustering Using An Atom Metaphor For Display And Interaction, Khanh V. Nghi May 2013

Document Collection Visualization And Clustering Using An Atom Metaphor For Display And Interaction, Khanh V. Nghi

Theses and Dissertations - UTB/UTPA

Visual Data Mining have proven to be of high value in exploratory data analysis and data mining because it provides an intuitive feedback on data analysis and support decision-making activities. Several visualization techniques have been developed for cluster discovery such as Grand Tour, HD-Eye, Star Coordinates, etc. They are very useful tool which are visualized in 2D or 3D; however, they have not simple for users who are not trained. This thesis proposes a new approach to build a 3D clustering visualization system for document clustering by using k-mean algorithm. A cluster will be represented by a neutron (centroid) and …


On Identifying Critical Nuggets Of Information During Classification Task, David Sathiaraj Jan 2013

On Identifying Critical Nuggets Of Information During Classification Task, David Sathiaraj

LSU Doctoral Dissertations

In large databases, there may exist critical nuggets - small collections of records or instances that contain domain-specific important information. This information can be used for future decision making such as labeling of critical, unlabeled data records and improving classification results by reducing false positive and false negative errors. In recent years, data mining efforts have focussed on pattern and outlier detection methods. However, not much effort has been dedicated to finding critical nuggets within a data set. This work introduces the idea of critical nuggets, proposes an innovative domain-independent method to measure criticality, suggests a heuristic to reduce the …


A Novel Computational Framework For Transcriptome Analysis With Rna-Seq Data, Yin Hu Jan 2013

A Novel Computational Framework For Transcriptome Analysis With Rna-Seq Data, Yin Hu

Theses and Dissertations--Computer Science

The advance of high-throughput sequencing technologies and their application on mRNA transcriptome sequencing (RNA-seq) have enabled comprehensive and unbiased profiling of the landscape of transcription in a cell. In order to address the current limitation of analyzing accuracy and scalability in transcriptome analysis, a novel computational framework has been developed on large-scale RNA-seq datasets with no dependence on transcript annotations. Directly from raw reads, a probabilistic approach is first applied to infer the best transcript fragment alignments from paired-end reads. Empowered by the identification of alternative splicing modules, this framework then performs precise and efficient differential analysis at automatically detected …


Exploring The Learnability Of Numeric Datasets, Di Lin Jan 2013

Exploring The Learnability Of Numeric Datasets, Di Lin

LSU Doctoral Dissertations

When doing classification, it has often been observed that datasets may exhibit different levels of difficulty with respect to how accurately they can be classified. That is, there are some datasets which can be classified very accurately by many classification algorithms, and there also exist some other datasets that no classifier can classify them with high accuracy. Based on this observation, we try to address the following problems: a)what are the factors that make a dataset easy or difficult to be accurately classified? b) how to use such factors to predict the difficulties of unclassified datasets? and c) how to …


Adaptive Grid Based Localized Learning For Multidimensional Data, Sheetal Saini Oct 2012

Adaptive Grid Based Localized Learning For Multidimensional Data, Sheetal Saini

Doctoral Dissertations

Rapid advances in data-rich domains of science, technology, and business has amplified the computational challenges of "Big Data" synthesis necessary to slow the widening gap between the rate at which the data is being collected and analyzed for knowledge. This has led to the renewed need for efficient and accurate algorithms, framework, and algorithmic mechanisms essential for knowledge discovery, especially in the domains of clustering, classification, dimensionality reduction, feature ranking, and feature selection. However, data mining algorithms are frequently challenged by the sparseness due to the high dimensionality of the datasets in such domains which is particularly detrimental to the …


Semi-Automatic Simulation Initialization By Mining Structured And Unstructured Data Formats From Local And Web Data Sources, Olcay Sahin Oct 2012

Semi-Automatic Simulation Initialization By Mining Structured And Unstructured Data Formats From Local And Web Data Sources, Olcay Sahin

Computational Modeling & Simulation Engineering Theses & Dissertations

Initialization is one of the most important processes for obtaining successful results from a simulation. However, initialization is a challenge when 1) a simulation requires hundreds or even thousands of input parameters or 2) re-initializing the simulation due to different initial conditions or runtime errors. These challenges lead to the modeler spending more time initializing a simulation and may lead to errors due to poor input data.

This thesis proposes two semi-automatic simulation initialization approaches that provide initialization using data mining from structured and unstructured data formats from local and web data sources. First, the System Initialization with Retrieval (SIR) …


A Confidence-Prioritization Approach To Data Processing In Noisy Data Sets And Resulting Estimation Models For Predicting Streamflow Diel Signals In The Pacific Northwest, Nathaniel Lee Gustafson Aug 2012

A Confidence-Prioritization Approach To Data Processing In Noisy Data Sets And Resulting Estimation Models For Predicting Streamflow Diel Signals In The Pacific Northwest, Nathaniel Lee Gustafson

Theses and Dissertations

Streams in small watersheds are often known to exhibit diel fluctuations, in which streamflow oscillates on a 24-hour cycle. Streamflow diel fluctuations, which we investigate in this study, are an informative indicator of environmental processes. However, in Environmental Data sets, as well as many others, there is a range of noise associated with individual data points. Some points are extracted under relatively clear and defined conditions, while others may include a range of known or unknown confounding factors, which may decrease those points' validity. These points may or may not remain useful for training, depending on how much uncertainty they …


Data Mining Of Tetraloop-Tetraloop Receptors In Rna Xml Files, Sinan Ramazanoglu May 2012

Data Mining Of Tetraloop-Tetraloop Receptors In Rna Xml Files, Sinan Ramazanoglu

Theses

RNA (Ribonucleic acid) Motifs are tertiary structures that play an important role in the folding mechanism of the RNA molecule. The overall function of a RNA Motif depends on its specific bp (base pairs) sequence that constitutes the secondary structure. Data mining is a novel method in both discovering potential tertiary structures within DNA (Deoxyribonucleic acid), RNA, and protein molecules and storing the information in databases. The RNA Motif of interest is the tetraloop-tetraloop receptor, which is composed of a highly conserved 11 nt (nucleotide) sequence and a tetraloop with the generic form of GNRA (where N = any base …


Analysis And Characterization Of Author Contribution Patterns In Open Source Software Development, Quinn Carlson Taylor Mar 2012

Analysis And Characterization Of Author Contribution Patterns In Open Source Software Development, Quinn Carlson Taylor

Theses and Dissertations

Software development is a process fraught with unpredictability, in part because software is created by people. Human interactions add complexity to development processes, and collaborative development can become a liability if not properly understood and managed. Recent years have seen an increase in the use of data mining techniques on publicly-available repository data with the goal of improving software development processes, and by extension, software quality. In this thesis, we introduce the concept of author entropy as a metric for quantifying interaction and collaboration (both within individual files and across projects), present results from two empirical observational studies of open-source …