Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Data mining

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 151 - 180 of 342

Full-Text Articles in Computer Sciences

Unsupervised Learning Framework For Large-Scale Flight Data Analysis Of Cockpit Human Machine Interaction Issues, Abhishek B. Vaidya Apr 2016

Unsupervised Learning Framework For Large-Scale Flight Data Analysis Of Cockpit Human Machine Interaction Issues, Abhishek B. Vaidya

Open Access Theses

As the level of automation within an aircraft increases, the interactions between the pilot and autopilot play a crucial role in its proper operation. Issues with human machine interactions (HMI) have been cited as one of the main causes behind many aviation accidents. Due to the complexity of such interactions, it is challenging to identify all possible situations and develop the necessary contingencies. In this thesis, we propose a data-driven analysis tool to identify potential HMI issues in large-scale Flight Operational Quality Assurance (FOQA) dataset. The proposed tool is developed using a multi-level clustering framework, where a set of basic …


A Cloud-Based Framework For Smart Permit System For Buildings, Magdalini Eirinaki, Subhankar Dhar, Shishir Mathur Jan 2016

A Cloud-Based Framework For Smart Permit System For Buildings, Magdalini Eirinaki, Subhankar Dhar, Shishir Mathur

Faculty Publications

In this paper we propose a novel cloud-based platform for building permit system that is efficient, user-friendly, transparent, and has quick turn-around time for homeowners. Compared to the existing permit systems, the proposed smart city permit framework provides a pre-permitting decision workflow, and incorporates a data analytics and mining module that enables the continuous improvement of a) the end user experience, by analyzing explicit and implicit user feedback, and b) the permitting and urban planning process, allowing a gleaning of key insights for real estate development and city planning purposes, by analyzing how users interact with the system depending on …


Novel Dynamic Partial Reconfiguration Implementations Of The Support Vector Machine Classifier On Fpga, Hanaa Hussain, Khaled Benkrid, Hüseyi̇n Şeker Jan 2016

Novel Dynamic Partial Reconfiguration Implementations Of The Support Vector Machine Classifier On Fpga, Hanaa Hussain, Khaled Benkrid, Hüseyi̇n Şeker

Turkish Journal of Electrical Engineering and Computer Sciences

The support vector machine (SVM) is one of the highly powerful classifiers that have been shown to be capable of dealing with high-dimensional data. However, its complexity increases requirements of computational power. Recent technologies including the postgenome data of high-dimensional nature add further complexity to the construction of SVM classifiers. In order to overcome this problem, hardware implementations of the SVM classifier have been proposed to benefit from parallelism to accelerate the SVM. On the other hand, those implementations offer limited flexibility in terms of changing parameters and require the reconfiguration of the whole device. The latter interrupts the operation …


Novel Machine Learning Methods For Modeling Time-To-Event Data, Bhanukiran Vinzamuri Jan 2016

Novel Machine Learning Methods For Modeling Time-To-Event Data, Bhanukiran Vinzamuri

Wayne State University Dissertations

Predicting time-to-event from longitudinal data where different events occur at different time points is an extremely important problem in several domains such as healthcare, economics, social networks and seismology, to name a few. A unique challenge in this problem involves building predictive models from right censored data (also called as survival data). This is a phenomenon where instances whose event of interest are not yet observed within a given observation time window and are considered to be right censored. Effective models for predicting time-to-event labels from such right censored data with good accuracy can have a significant impact in these …


Iot+Small Data: Transforming In-Store Shopping Analytics And Services, Meera Radhakrishnan, Sougata Sen, Vigneshwaran Subbaraju, Archan Misra, Rajesh Balan Jan 2016

Iot+Small Data: Transforming In-Store Shopping Analytics And Services, Meera Radhakrishnan, Sougata Sen, Vigneshwaran Subbaraju, Archan Misra, Rajesh Balan

Research Collection School Of Computing and Information Systems

We espouse a vision of small data-based immersive retail analytics, where a combination of sensor data, from personal wearable-devices and store-deployed sensors & IoT devices, is used to create real-time, individualized services for in-store shoppers. Key challenges include (a) appropriate joint mining of sensor & wearable data to capture a shopper’s product level interactions, and (b) judicious triggering of power-hungry wearable sensors (e.g., camera) to capture only relevant portions of a shopper’s in-store activities. To explore the feasibility of our vision, we conducted experiments with 5 smartwatch-wearing users who interacted with objects placed on cupboard racks in our lab (to …


Speaker Identification In Live Events Using Twitter, Minumol Joseph Dec 2015

Speaker Identification In Live Events Using Twitter, Minumol Joseph

Computer Science and Engineering Theses

The prevalence of social media has given rise to a new research area. Data from social media is now being used in research to gather deeper insights into many different fields. Twitter is one of the most popular microblogging websites. Users express themselves on a variety of different topics in 140 characters or less. Oftentimes, users “tweet” about issues and subjects that are gaining in popularity, a great example being politics. Any development in politics frequently results in a tweet of some form. The research which follows focuses on identifying a speaker’s name at a live event by collecting and …


Capstone Projects Mining System For Insights And Recommendations, Melvrivk Aik Chun Goh, Swapna Gottipati, Venky Shankararaman Dec 2015

Capstone Projects Mining System For Insights And Recommendations, Melvrivk Aik Chun Goh, Swapna Gottipati, Venky Shankararaman

Research Collection School Of Computing and Information Systems

In this paper, we present a classification based system to discover knowledge and trends in higher education students’ projects. Essentially, the educational capstone projects provide an opportunity for students to apply what they have learned and prepare themselves for industry needs. Therefore mining such projects gives insights of students’ experiences as well as industry project requirements and trends. In particular, we mine capstone projects executed by Information Systems students to discover patterns and insights related to people, organization, domain, industry needs and time. We build a capstone projects mining system (CPMS) based on classification models that leverage text mining, natural …


Intelligshop: Enabling Intelligent Shopping In Malls Through Location-Based Augmented Reality, Aditi Adhikari, Vincent W. Zheng, Hong Cao, Miao Lin, Yuan Fang, Kevin Chen-Chuan Chang Nov 2015

Intelligshop: Enabling Intelligent Shopping In Malls Through Location-Based Augmented Reality, Aditi Adhikari, Vincent W. Zheng, Hong Cao, Miao Lin, Yuan Fang, Kevin Chen-Chuan Chang

Research Collection School Of Computing and Information Systems

Shopping experience is important for both citizens and tourists. We present IntelligShop, a novel location-based augmented reality application that supports intelligent shopping experience in malls. As the key functionality, IntelligShop provides an augmented reality interface-people can simply use ubiquitous smartphones to face mall retailers, then IntelligShop will automatically recognize the retailers and fetch their online reviews from various sources (including blogs, forums and publicly accessible social media) to display on the phones. Technically, IntelligShop addresses two challenging data mining problems, including robust feature learning to support heterogeneous smartphones in localization and learning to query for automatically gathering the retailer content …


The Importance Of Being Isolated: An Empirical Study On Chromium Reviews, Subhajit Datta, Devarshi Bhatt, Manish Jain, Proshanta Sarkar, Santonu Sarkar Oct 2015

The Importance Of Being Isolated: An Empirical Study On Chromium Reviews, Subhajit Datta, Devarshi Bhatt, Manish Jain, Proshanta Sarkar, Santonu Sarkar

Research Collection School Of Computing and Information Systems

As large scale software development has become more collaborative, and software teams more globally distributed, several studies have explored how developer interaction influences software development outcomes. The emphasis so far has been largely on outcomes like defect count, the time to close modification requests etc. In the paper, we examine data from the Chromium project to understand how different aspects of developer discussion relate to the closure time of reviews. On the basis of analyzing reviews discussed by 2000+ developers, our results indicate that quicker closure of reviews owned by a developer relates to higher reception of information and insights …


Clustering-Based Personalization, Seyed Nima Mirbakhsh Sep 2015

Clustering-Based Personalization, Seyed Nima Mirbakhsh

Electronic Thesis and Dissertation Repository

Recommendation systems have been the most emerging technology in the last decade as one of the key parts in e-commerce ecosystem. Businesses offer a wide variety of items and contents through different channels such as Internet, Smart TVs, Digital Screens, etc. The number of these items sometimes goes over millions for some businesses. Therefore, users can have trouble finding the products that they are looking for. Recommendation systems address this problem by providing powerful methods which enable users to filter through large information and product space based on their preferences. Moreover, users have different preferences. Thus, businesses can employ recommendation …


Rough-Fuzzy Hybrid Approach For Identification Of Bio-Markers And Classification On Alzheimer's Disease Data, Changsu Lee, Chiou-Peng Lam, Martin Masek Jul 2015

Rough-Fuzzy Hybrid Approach For Identification Of Bio-Markers And Classification On Alzheimer's Disease Data, Changsu Lee, Chiou-Peng Lam, Martin Masek

Martin Masek

A new approach is proposed in this paper for identification of biomarkers and classification on Alzheimer's disease data by employing a rough-fuzzy hybrid approach called ARFIS (a framework for Adaptive TS-type Rough-Fuzzy Inference Systems). In this approach, the entropy-based discretization technique is employed first on the training data to generate clusters for each attribute with respect to the output information. The rough set-based feature reduction method is then utilized to reduce the number of features in a decision table obtained using the cluster information. Another rough set-based approach is employed for the generation of decision rules. After the construction and …


Exploratory Data Modeling Of Traumatic Brain Injury, Martin Zwick Jun 2015

Exploratory Data Modeling Of Traumatic Brain Injury, Martin Zwick

Systems Science Faculty Publications and Presentations

A short presentation of an analysis of data from Dr. Megan Preece on traumatic brain injury, the first in a series of planned secondary analyses of multiple TBI data sets. The analysis employs the systems methodology of reconstructability analysis (RA), utilizing both variable- and state-based and both neutral and directed models. The presentation explains RA and illustrates the results it can obtain. Unlike the confirmatory approach standard to most data analyses, this methodology is designed for exploratory modeling. It thus allows the discovery of unanticipated associations among variables, including multi-variable interaction effects of unknown form. It offers the opportunity for …


Should We Use The Sample? Analyzing Datasets Sampled From Twitter's Stream Api, Yazhe Wang, Jamie Callan, Baihua Zheng Jun 2015

Should We Use The Sample? Analyzing Datasets Sampled From Twitter's Stream Api, Yazhe Wang, Jamie Callan, Baihua Zheng

Research Collection School Of Computing and Information Systems

Researchers have begun studying content obtained from microblogging services such as Twitter to address a variety of technological, social, and commercial research questions. The large number of Twitter users and even larger volume of tweets often make it impractical to collect and maintain a complete record of activity; therefore, most research and some commercial software applications rely on samples, often relatively small samples, of Twitter data. For the most part, sample sizes have been based on availability and practical considerations. Relatively little attention has been paid to how well these samples represent the underlying stream of Twitter data. To fill …


Author Topic Model-Based Collaborative Filtering For Personalized Poi Recommendations, Shuhui Jiang, Xueming Qian, Jialie Shen, Yun Fu, Tao Mei Jun 2015

Author Topic Model-Based Collaborative Filtering For Personalized Poi Recommendations, Shuhui Jiang, Xueming Qian, Jialie Shen, Yun Fu, Tao Mei

Research Collection School Of Computing and Information Systems

From social media has emerged continuous needs for automatic travel recommendations. Collaborative filtering (CF) is the most well-known approach. However, existing approaches generally suffer from various weaknesses. For example, sparsity can significantly degrade the performance of traditional CF. If a user only visits very few locations, accurate similar user identification becomes very challenging due to lack of sufficient information for effective inference. Moreover, existing recommendation approaches often ignore rich user information like textual descriptions of photos which can reflect users' travel preferences. The topic model (TM) method is an effective way to solve the "sparsity problem," but is still far …


Data Mining In Computational Proteomics And Genomics, Yang Song May 2015

Data Mining In Computational Proteomics And Genomics, Yang Song

Dissertations

This dissertation addresses data mining in bioinformatics by investigating two important problems, namely peak detection and structure matching. Peak detection is useful for biological pattern discovery while structure matching finds many applications in clustering and classification.

The first part of this dissertation focuses on elastic peak detection in 2D liquid chromatographic mass spectrometry (LC-MS) data used in proteomics research. These data can be modeled as a time series, in which the X-axis represents time points and the Y-axis represents intensity values. A peak occurs in a set of 2D LC-MS data when the sum of the intensity values in a …


Data Mining Temporal Work Patterns Of Programming Student Populations, Dale E. Parson, Lori Bogumil, Allison Seidel Apr 2015

Data Mining Temporal Work Patterns Of Programming Student Populations, Dale E. Parson, Lori Bogumil, Allison Seidel

Computer Science and Information Technology Faculty

This paper reports the second stage of a study of the correlations between the temporal work patterns of computer programming students and their success or failure as measured by programming project assignment grades and related metrics. The first stage confirmed the importance for most students of getting an early start on a programming project, and it also uncovered the fact that some student groups perform well with late starts, suggesting the likelihood that they engage in the productive practice of active procrastination. The second most important factor for success is the average length of assignment work sessions. Session lengths from …


Improving Software Quality And Productivity Leveraging Mining Techniques: [Summary Of The Second Workshop On Software Mining, At Ase 2013], Ming Li, Hongyu Zhang, David Lo, Lucia Lucia Jan 2015

Improving Software Quality And Productivity Leveraging Mining Techniques: [Summary Of The Second Workshop On Software Mining, At Ase 2013], Ming Li, Hongyu Zhang, David Lo, Lucia Lucia

Research Collection School Of Computing and Information Systems

The second International Workshop on Software Mining (Soft-mine) was held on the 11th of November 2013. The workshop was held in conjunction with the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE) in Silicon Valley, California, USA. The workshop has facilitated researchers who are interested in mining various types of software-related data and in applying data mining techniques to support software engineering tasks. During the workshop, seven papers on software mining and behavior models, execution trace mining, and bug localization and fixing were presented. One of the papers received the best paper award. Furthermore, there were two invited talk …


A Theory Of Name Resolution, Pierre Néron, Andrew Tolmach, Eelco Visser, Guido Wachsmuth Jan 2015

A Theory Of Name Resolution, Pierre Néron, Andrew Tolmach, Eelco Visser, Guido Wachsmuth

Computer Science Faculty Publications and Presentations

We describe a language-independent theory for name binding and resolution, suitable for programming languages with complex scoping rules including both lexical scoping and modules. We formulate name resolution as a two-stage problem. First a language-independent scope graph is constructed using language-specific rules from an abstract syntax tree. Then references in the scope graph are resolved to corresponding declarations using a language-independent resolution process. We introduce a resolution calculus as a concise, declarative, and language- independent specification of name resolution. We develop a resolution algorithm that is sound and complete with respect to the calculus. Based on the resolution calculus we …


E-Mail Authorship Attribution Using Customized Associative Classification, Michael R. Schmid, Farkhund Iqbal, Benjamin C.M. Fung Jan 2015

E-Mail Authorship Attribution Using Customized Associative Classification, Michael R. Schmid, Farkhund Iqbal, Benjamin C.M. Fung

All Works

E-mail communication is often abused for conducting social engineering attacks including spamming, phishing, identity theft and for distributing malware. This is largely attributed to the problem of anonymity inherent in the standard electronic mail protocol. In the literature, authorship attribution is studied as a text categorization problem where the writing styles of individuals are modeled based on their previously written sample documents. The developed model is employed to identify the most plausible writer of the text. Unfortunately, most existing studies focus solely on improving predictive accuracy and not on the inherent value of the evidence collected. In this study, we …


Novel Computational Methods For Transcript Reconstruction And Quantification Using Rna-Seq Data, Yan Huang Jan 2015

Novel Computational Methods For Transcript Reconstruction And Quantification Using Rna-Seq Data, Yan Huang

Theses and Dissertations--Computer Science

The advent of RNA-seq technologies provides an unprecedented opportunity to precisely profile the mRNA transcriptome of a specific cell population. It helps reveal the characteristics of the cell under the particular condition such as a disease. It is now possible to discover mRNA transcripts not cataloged in existing database, in addition to assessing the identities and quantities of the known transcripts in a given sample or cell. However, the sequence reads obtained from an RNA-seq experiment is only a short fragment of the original transcript. How to recapitulate the mRNA transcriptome from short RNA-seq reads remains a challenging problem. We …


Indirect Association Rule Mining For Crime Data Analysis, Riley Englin Jan 2015

Indirect Association Rule Mining For Crime Data Analysis, Riley Englin

EWU Masters Thesis Collection

"Crime data analysis is difficult to undertake. There are continuous efforts to analyze crime and determine ways to combat crime but that task is a complex one. Additionally, the nature of a domestic violence crime is hard to detect and even more difficult to predict. Recently police have taken steps to better classify domestic violence cases. The problem is that there is nominal research into this category of crime, possibly due to its sensitive nature or lack of data available for analysis, and therefore there is little known about these crimes and how they relate to others. The objectives of …


Automatic Classification Of Harmonic Data Using $K$-Means And Least Square Support Vector Machine, Hüseyi̇n Eri̇şti̇, Vedat Tümen, Özal Yildirim, Belkis Eri̇şti̇, Yakup Demi̇r Jan 2015

Automatic Classification Of Harmonic Data Using $K$-Means And Least Square Support Vector Machine, Hüseyi̇n Eri̇şti̇, Vedat Tümen, Özal Yildirim, Belkis Eri̇şti̇, Yakup Demi̇r

Turkish Journal of Electrical Engineering and Computer Sciences

In this paper, an effective classification approach to classify harmonic data has been proposed. In the proposed classifier approach, harmonic data obtained through a 3-phase system have been classified by using $k$-means and least square support vector machine (LS-SVM) models. In order to obtain class details regarding harmonic data, a $k$-means clustering algorithm has been applied to these data first. The training of the LS-SVM model has been realized with the class details obtained through the $k$-means algorithm. To increase the efficiency of the LS-SVM model, the regularization and kernel parameters of this model have been determined with a grid …


Topic Analysis And Application Using Nonnegative Matrix Factorizations (Nmf), Xin Wang Jan 2015

Topic Analysis And Application Using Nonnegative Matrix Factorizations (Nmf), Xin Wang

Legacy Theses & Dissertations (2009 - 2024)

Managing large and growing amount of information is a central goal of modern computer science. Data repositories of texts, images and videos have become widely accessible, thus necessitating good methods of retrieval, organization and exploration.


Pattern Mining And Events Discovery In Molecular Dynamics Simulations Data, Shobhit Sandesh Shakya Jan 2015

Pattern Mining And Events Discovery In Molecular Dynamics Simulations Data, Shobhit Sandesh Shakya

LSU Doctoral Dissertations

Molecular dynamics simulation method is widely used to calculate and understand a wide range of properties of materials. A lot of research efforts have been focused on simulation techniques but relatively fewer works are done on methods for analyzing the simulation results. Large-scale simulations usually generate massive amounts of data, which make manual analysis infeasible, particularly when it is necessary to look into the details of the simulation results. In this dissertation, we propose a system that uses computational method to automatically perform analysis of simulation data, which represent atomic position-time series. The system identifies, in an automated fashion, the …


Drip - Data Rich, Information Poor: A Concise Synopsis Of Data Mining, Muhammad Obeidat, Max North, Lloyd Burgess, Sarah North Dec 2014

Drip - Data Rich, Information Poor: A Concise Synopsis Of Data Mining, Muhammad Obeidat, Max North, Lloyd Burgess, Sarah North

Faculty and Research Publications

As production of data is exponentially growing with a drastically lower cost, the importance of data mining required to extract and discover valuable information is becoming more paramount. To be functional in any business or industry, data must be capable of supporting sound decision-making and plausible prediction. The purpose of this paper is concisely but broadly to provide a synopsis of the technology and theory of data mining, providing an enhanced comprehension of the methods by which massive data can be transferred into meaningful information.


Twitter Location (Sometimes) Matters: Exploring The Relationship Between Georeferenced Tweet Content And Nearby Feature Classes, Stefan Hahmann, Ross S. Purves, Dirk Burghardt Dec 2014

Twitter Location (Sometimes) Matters: Exploring The Relationship Between Georeferenced Tweet Content And Nearby Feature Classes, Stefan Hahmann, Ross S. Purves, Dirk Burghardt

Journal of Spatial Information Science

In this paper, we investigate whether microblogging texts (tweets) produced on mobile devices are related to the geographical locations where they were posted. For this purpose, we correlate tweet topics to areas. In doing so, classified points of interest from OpenStreetMap serve as validation points. We adopted the classification and geolocation of these points to correlate with tweet content by means of manual, supervised, and unsupervised machine learning approaches. Evaluation showed the manual classification approach to be highest quality, followed by the supervised method, and that the unsupervised classification was of low quality. We found that the degree to which …


Social Fingerprinting: Identifying Users Of Social Networks By Their Data Footprint, Denise Koessler Gosnell Dec 2014

Social Fingerprinting: Identifying Users Of Social Networks By Their Data Footprint, Denise Koessler Gosnell

Doctoral Dissertations

This research defines, models, and quantifies a new metric for social networks: the social fingerprint. Just as one's fingers leave behind a unique trace in a print, this dissertation introduces and demonstrates that the manner in which people interact with other accounts on social networks creates a unique data trail. Accurate identification of a user's social fingerprint can address the growing demand for improved techniques in unique user account analysis, computational forensics and social network analysis.

In this dissertation, we theorize, construct and test novel software and methodologies which quantify features of social network data. All approaches and methodologies are …


Time-Series Data Mining In Transportation: A Case Study On Singapore Public Train Commuter Travel Patterns, Roy Ka Wei Lee, Tin Seong Kam Oct 2014

Time-Series Data Mining In Transportation: A Case Study On Singapore Public Train Commuter Travel Patterns, Roy Ka Wei Lee, Tin Seong Kam

Research Collection School Of Computing and Information Systems

The adoption of smart cards technologies and automated data collection systems (ADCS) in transportation domain had provided public transport planners opportunities to amass a huge and continuously increasing amount of time-series data about the behaviors and travel patterns of commuters. However the explosive growth of temporal related databases has far outpaced the transport planners’ ability to interpret these data using conventional statistical techniques, creating an urgent need for new techniques to support the analyst in transforming the data into actionable information and knowledge. This research study thus explores and discusses the potential use of time-series data mining, a relatively new …


Collaborative Online Multitask Learning, Guangxia Li, Steven C. H. Hoi, Kuiyu Chang, Wenting Liu, Ramesh Jain Aug 2014

Collaborative Online Multitask Learning, Guangxia Li, Steven C. H. Hoi, Kuiyu Chang, Wenting Liu, Ramesh Jain

Research Collection School Of Computing and Information Systems

We study the problem of online multitask learning for solving multiple related classification tasks in parallel, aiming at classifying every sequence of data received by each task accurately and efficiently. One practical example of online multitask learning is the micro-blog sentiment detection on a group of users, which classifies micro-blog posts generated by each user into emotional or non-emotional categories. This particular online learning task is challenging for a number of reasons. First of all, to meet the critical requirements of online applications, a highly efficient and scalable classification solution that can make immediate predictions with low learning cost is …


A Knowledge Discovery Approach For The Detection Of Power Grid State Variable Attacks, Nathan Wallace Jul 2014

A Knowledge Discovery Approach For The Detection Of Power Grid State Variable Attacks, Nathan Wallace

Doctoral Dissertations

As the level of sophistication in power system technologies increases, the amount of system state parameters being recorded also increases. This data not only provides an opportunity for monitoring and diagnostics of a power system, but it also creates an environment wherein security can be maintained. Being able to extract relevant information from this pool of data is one of the key challenges still yet to be obtained in the smart grid. The potential exists for the creation of innovative power grid cybersecurity applications, which harness the information gained from advanced analytics. Such analytics can be based on the extraction …