Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Data mining

2015

Discipline
Institution
Publication
Publication Type
File Type

Articles 1 - 18 of 18

Full-Text Articles in Computer Sciences

Capstone Projects Mining System For Insights And Recommendations, Melvrivk Aik Chun Goh, Swapna Gottipati, Venky Shankararaman Dec 2015

Capstone Projects Mining System For Insights And Recommendations, Melvrivk Aik Chun Goh, Swapna Gottipati, Venky Shankararaman

Research Collection School Of Computing and Information Systems

In this paper, we present a classification based system to discover knowledge and trends in higher education students’ projects. Essentially, the educational capstone projects provide an opportunity for students to apply what they have learned and prepare themselves for industry needs. Therefore mining such projects gives insights of students’ experiences as well as industry project requirements and trends. In particular, we mine capstone projects executed by Information Systems students to discover patterns and insights related to people, organization, domain, industry needs and time. We build a capstone projects mining system (CPMS) based on classification models that leverage text mining, natural …


Intelligshop: Enabling Intelligent Shopping In Malls Through Location-Based Augmented Reality, Aditi Adhikari, Vincent W. Zheng, Hong Cao, Miao Lin, Yuan Fang, Kevin Chen-Chuan Chang Nov 2015

Intelligshop: Enabling Intelligent Shopping In Malls Through Location-Based Augmented Reality, Aditi Adhikari, Vincent W. Zheng, Hong Cao, Miao Lin, Yuan Fang, Kevin Chen-Chuan Chang

Research Collection School Of Computing and Information Systems

Shopping experience is important for both citizens and tourists. We present IntelligShop, a novel location-based augmented reality application that supports intelligent shopping experience in malls. As the key functionality, IntelligShop provides an augmented reality interface-people can simply use ubiquitous smartphones to face mall retailers, then IntelligShop will automatically recognize the retailers and fetch their online reviews from various sources (including blogs, forums and publicly accessible social media) to display on the phones. Technically, IntelligShop addresses two challenging data mining problems, including robust feature learning to support heterogeneous smartphones in localization and learning to query for automatically gathering the retailer content …


The Importance Of Being Isolated: An Empirical Study On Chromium Reviews, Subhajit Datta, Devarshi Bhatt, Manish Jain, Proshanta Sarkar, Santonu Sarkar Oct 2015

The Importance Of Being Isolated: An Empirical Study On Chromium Reviews, Subhajit Datta, Devarshi Bhatt, Manish Jain, Proshanta Sarkar, Santonu Sarkar

Research Collection School Of Computing and Information Systems

As large scale software development has become more collaborative, and software teams more globally distributed, several studies have explored how developer interaction influences software development outcomes. The emphasis so far has been largely on outcomes like defect count, the time to close modification requests etc. In the paper, we examine data from the Chromium project to understand how different aspects of developer discussion relate to the closure time of reviews. On the basis of analyzing reviews discussed by 2000+ developers, our results indicate that quicker closure of reviews owned by a developer relates to higher reception of information and insights …


Clustering-Based Personalization, Seyed Nima Mirbakhsh Sep 2015

Clustering-Based Personalization, Seyed Nima Mirbakhsh

Electronic Thesis and Dissertation Repository

Recommendation systems have been the most emerging technology in the last decade as one of the key parts in e-commerce ecosystem. Businesses offer a wide variety of items and contents through different channels such as Internet, Smart TVs, Digital Screens, etc. The number of these items sometimes goes over millions for some businesses. Therefore, users can have trouble finding the products that they are looking for. Recommendation systems address this problem by providing powerful methods which enable users to filter through large information and product space based on their preferences. Moreover, users have different preferences. Thus, businesses can employ recommendation …


Rough-Fuzzy Hybrid Approach For Identification Of Bio-Markers And Classification On Alzheimer's Disease Data, Changsu Lee, Chiou-Peng Lam, Martin Masek Jul 2015

Rough-Fuzzy Hybrid Approach For Identification Of Bio-Markers And Classification On Alzheimer's Disease Data, Changsu Lee, Chiou-Peng Lam, Martin Masek

Martin Masek

A new approach is proposed in this paper for identification of biomarkers and classification on Alzheimer's disease data by employing a rough-fuzzy hybrid approach called ARFIS (a framework for Adaptive TS-type Rough-Fuzzy Inference Systems). In this approach, the entropy-based discretization technique is employed first on the training data to generate clusters for each attribute with respect to the output information. The rough set-based feature reduction method is then utilized to reduce the number of features in a decision table obtained using the cluster information. Another rough set-based approach is employed for the generation of decision rules. After the construction and …


Exploratory Data Modeling Of Traumatic Brain Injury, Martin Zwick Jun 2015

Exploratory Data Modeling Of Traumatic Brain Injury, Martin Zwick

Systems Science Faculty Publications and Presentations

A short presentation of an analysis of data from Dr. Megan Preece on traumatic brain injury, the first in a series of planned secondary analyses of multiple TBI data sets. The analysis employs the systems methodology of reconstructability analysis (RA), utilizing both variable- and state-based and both neutral and directed models. The presentation explains RA and illustrates the results it can obtain. Unlike the confirmatory approach standard to most data analyses, this methodology is designed for exploratory modeling. It thus allows the discovery of unanticipated associations among variables, including multi-variable interaction effects of unknown form. It offers the opportunity for …


Should We Use The Sample? Analyzing Datasets Sampled From Twitter's Stream Api, Yazhe Wang, Jamie Callan, Baihua Zheng Jun 2015

Should We Use The Sample? Analyzing Datasets Sampled From Twitter's Stream Api, Yazhe Wang, Jamie Callan, Baihua Zheng

Research Collection School Of Computing and Information Systems

Researchers have begun studying content obtained from microblogging services such as Twitter to address a variety of technological, social, and commercial research questions. The large number of Twitter users and even larger volume of tweets often make it impractical to collect and maintain a complete record of activity; therefore, most research and some commercial software applications rely on samples, often relatively small samples, of Twitter data. For the most part, sample sizes have been based on availability and practical considerations. Relatively little attention has been paid to how well these samples represent the underlying stream of Twitter data. To fill …


Author Topic Model-Based Collaborative Filtering For Personalized Poi Recommendations, Shuhui Jiang, Xueming Qian, Jialie Shen, Yun Fu, Tao Mei Jun 2015

Author Topic Model-Based Collaborative Filtering For Personalized Poi Recommendations, Shuhui Jiang, Xueming Qian, Jialie Shen, Yun Fu, Tao Mei

Research Collection School Of Computing and Information Systems

From social media has emerged continuous needs for automatic travel recommendations. Collaborative filtering (CF) is the most well-known approach. However, existing approaches generally suffer from various weaknesses. For example, sparsity can significantly degrade the performance of traditional CF. If a user only visits very few locations, accurate similar user identification becomes very challenging due to lack of sufficient information for effective inference. Moreover, existing recommendation approaches often ignore rich user information like textual descriptions of photos which can reflect users' travel preferences. The topic model (TM) method is an effective way to solve the "sparsity problem," but is still far …


Data Mining In Computational Proteomics And Genomics, Yang Song May 2015

Data Mining In Computational Proteomics And Genomics, Yang Song

Dissertations

This dissertation addresses data mining in bioinformatics by investigating two important problems, namely peak detection and structure matching. Peak detection is useful for biological pattern discovery while structure matching finds many applications in clustering and classification.

The first part of this dissertation focuses on elastic peak detection in 2D liquid chromatographic mass spectrometry (LC-MS) data used in proteomics research. These data can be modeled as a time series, in which the X-axis represents time points and the Y-axis represents intensity values. A peak occurs in a set of 2D LC-MS data when the sum of the intensity values in a …


Data Mining Temporal Work Patterns Of Programming Student Populations, Dale E. Parson, Lori Bogumil, Allison Seidel Apr 2015

Data Mining Temporal Work Patterns Of Programming Student Populations, Dale E. Parson, Lori Bogumil, Allison Seidel

Computer Science and Information Technology Faculty

This paper reports the second stage of a study of the correlations between the temporal work patterns of computer programming students and their success or failure as measured by programming project assignment grades and related metrics. The first stage confirmed the importance for most students of getting an early start on a programming project, and it also uncovered the fact that some student groups perform well with late starts, suggesting the likelihood that they engage in the productive practice of active procrastination. The second most important factor for success is the average length of assignment work sessions. Session lengths from …


Improving Software Quality And Productivity Leveraging Mining Techniques: [Summary Of The Second Workshop On Software Mining, At Ase 2013], Ming Li, Hongyu Zhang, David Lo, Lucia Lucia Jan 2015

Improving Software Quality And Productivity Leveraging Mining Techniques: [Summary Of The Second Workshop On Software Mining, At Ase 2013], Ming Li, Hongyu Zhang, David Lo, Lucia Lucia

Research Collection School Of Computing and Information Systems

The second International Workshop on Software Mining (Soft-mine) was held on the 11th of November 2013. The workshop was held in conjunction with the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE) in Silicon Valley, California, USA. The workshop has facilitated researchers who are interested in mining various types of software-related data and in applying data mining techniques to support software engineering tasks. During the workshop, seven papers on software mining and behavior models, execution trace mining, and bug localization and fixing were presented. One of the papers received the best paper award. Furthermore, there were two invited talk …


Pattern Mining And Events Discovery In Molecular Dynamics Simulations Data, Shobhit Sandesh Shakya Jan 2015

Pattern Mining And Events Discovery In Molecular Dynamics Simulations Data, Shobhit Sandesh Shakya

LSU Doctoral Dissertations

Molecular dynamics simulation method is widely used to calculate and understand a wide range of properties of materials. A lot of research efforts have been focused on simulation techniques but relatively fewer works are done on methods for analyzing the simulation results. Large-scale simulations usually generate massive amounts of data, which make manual analysis infeasible, particularly when it is necessary to look into the details of the simulation results. In this dissertation, we propose a system that uses computational method to automatically perform analysis of simulation data, which represent atomic position-time series. The system identifies, in an automated fashion, the …


Novel Computational Methods For Transcript Reconstruction And Quantification Using Rna-Seq Data, Yan Huang Jan 2015

Novel Computational Methods For Transcript Reconstruction And Quantification Using Rna-Seq Data, Yan Huang

Theses and Dissertations--Computer Science

The advent of RNA-seq technologies provides an unprecedented opportunity to precisely profile the mRNA transcriptome of a specific cell population. It helps reveal the characteristics of the cell under the particular condition such as a disease. It is now possible to discover mRNA transcripts not cataloged in existing database, in addition to assessing the identities and quantities of the known transcripts in a given sample or cell. However, the sequence reads obtained from an RNA-seq experiment is only a short fragment of the original transcript. How to recapitulate the mRNA transcriptome from short RNA-seq reads remains a challenging problem. We …


Automatic Classification Of Harmonic Data Using $K$-Means And Least Square Support Vector Machine, Hüseyi̇n Eri̇şti̇, Vedat Tümen, Özal Yildirim, Belkis Eri̇şti̇, Yakup Demi̇r Jan 2015

Automatic Classification Of Harmonic Data Using $K$-Means And Least Square Support Vector Machine, Hüseyi̇n Eri̇şti̇, Vedat Tümen, Özal Yildirim, Belkis Eri̇şti̇, Yakup Demi̇r

Turkish Journal of Electrical Engineering and Computer Sciences

In this paper, an effective classification approach to classify harmonic data has been proposed. In the proposed classifier approach, harmonic data obtained through a 3-phase system have been classified by using $k$-means and least square support vector machine (LS-SVM) models. In order to obtain class details regarding harmonic data, a $k$-means clustering algorithm has been applied to these data first. The training of the LS-SVM model has been realized with the class details obtained through the $k$-means algorithm. To increase the efficiency of the LS-SVM model, the regularization and kernel parameters of this model have been determined with a grid …


E-Mail Authorship Attribution Using Customized Associative Classification, Michael R. Schmid, Farkhund Iqbal, Benjamin C.M. Fung Jan 2015

E-Mail Authorship Attribution Using Customized Associative Classification, Michael R. Schmid, Farkhund Iqbal, Benjamin C.M. Fung

All Works

E-mail communication is often abused for conducting social engineering attacks including spamming, phishing, identity theft and for distributing malware. This is largely attributed to the problem of anonymity inherent in the standard electronic mail protocol. In the literature, authorship attribution is studied as a text categorization problem where the writing styles of individuals are modeled based on their previously written sample documents. The developed model is employed to identify the most plausible writer of the text. Unfortunately, most existing studies focus solely on improving predictive accuracy and not on the inherent value of the evidence collected. In this study, we …


Indirect Association Rule Mining For Crime Data Analysis, Riley Englin Jan 2015

Indirect Association Rule Mining For Crime Data Analysis, Riley Englin

EWU Masters Thesis Collection

"Crime data analysis is difficult to undertake. There are continuous efforts to analyze crime and determine ways to combat crime but that task is a complex one. Additionally, the nature of a domestic violence crime is hard to detect and even more difficult to predict. Recently police have taken steps to better classify domestic violence cases. The problem is that there is nominal research into this category of crime, possibly due to its sensitive nature or lack of data available for analysis, and therefore there is little known about these crimes and how they relate to others. The objectives of …


Topic Analysis And Application Using Nonnegative Matrix Factorizations (Nmf), Xin Wang Jan 2015

Topic Analysis And Application Using Nonnegative Matrix Factorizations (Nmf), Xin Wang

Legacy Theses & Dissertations (2009 - 2024)

Managing large and growing amount of information is a central goal of modern computer science. Data repositories of texts, images and videos have become widely accessible, thus necessitating good methods of retrieval, organization and exploration.


A Theory Of Name Resolution, Pierre Néron, Andrew Tolmach, Eelco Visser, Guido Wachsmuth Jan 2015

A Theory Of Name Resolution, Pierre Néron, Andrew Tolmach, Eelco Visser, Guido Wachsmuth

Computer Science Faculty Publications and Presentations

We describe a language-independent theory for name binding and resolution, suitable for programming languages with complex scoping rules including both lexical scoping and modules. We formulate name resolution as a two-stage problem. First a language-independent scope graph is constructed using language-specific rules from an abstract syntax tree. Then references in the scope graph are resolved to corresponding declarations using a language-independent resolution process. We introduce a resolution calculus as a concise, declarative, and language- independent specification of name resolution. We develop a resolution algorithm that is sound and complete with respect to the calculus. Based on the resolution calculus we …