Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Data mining

2013

Discipline
Institution
Publication
Publication Type

Articles 1 - 19 of 19

Full-Text Articles in Computer Sciences

Towards A Hybrid Framework For Detecting Input Manipulation Vulnerabilities, Sun Ding, Hee Beng Kuan Tan, Lwin Khin Shar, Bindu Madhavi Padmanabhuni Dec 2013

Towards A Hybrid Framework For Detecting Input Manipulation Vulnerabilities, Sun Ding, Hee Beng Kuan Tan, Lwin Khin Shar, Bindu Madhavi Padmanabhuni

Research Collection School Of Computing and Information Systems

Input manipulation vulnerabilities such as SQL Injection, Cross-site scripting, Buffer Overflow vulnerabilities are highly prevalent and pose critical security risks. As a result, many methods have been proposed to apply static analysis, dynamic analysis or a combination of them, to detect such security vulnerabilities. Most of the existing methods classify vulnerabilities into safe and unsafe. They have both false-positive and false-negative cases. In general, security vulnerability can be classified into three cases: (1) provable safe, (2) provable unsafe, (3) unsure. In this paper, we propose a hybrid framework-Detecting Input Manipulation Vulnerabilities (DIMV), to verify the adequacy of security vulnerability defenses …


Using Machine Learning Techniques To Customize The User's Profile, Helps Intelligent Tv Decoder’S Design, Alketa Hyso, Roneda Mucaj Nov 2013

Using Machine Learning Techniques To Customize The User's Profile, Helps Intelligent Tv Decoder’S Design, Alketa Hyso, Roneda Mucaj

UBT International Conference

In today's society due to the increase of the quantity of information is becoming more difficult to find the information we search. "Data mining" offers us the most important methods and techniques in data analysis. Through this work, we aim to study the several data mining techniques, methods and applications in specific areas. We experiment with an “open software" WEKA, to perform some data analysis, presenting the reliability and advantages of data mining classification technique. We use the decision trees technique to achieve the task of classification, to customize user profiles based on their requirements and needs. This paper presents …


Mining Branching-Time Scenarios, Dirk Fahland, David Lo, Shahar Maoz Nov 2013

Mining Branching-Time Scenarios, Dirk Fahland, David Lo, Shahar Maoz

Research Collection School Of Computing and Information Systems

Specification mining extracts candidate specification from existing systems, to be used for downstream tasks such as testing and verification. Specifically, we are interested in the extraction of behavior models from execution traces. In this paper we introduce mining of branching-time scenarios in the form of existential, conditional Live Sequence Charts, using a statistical data-mining algorithm. We show the power of branching scenarios to reveal alternative scenario-based behaviors, which could not be mined by previous approaches. The work contrasts and complements previous works on mining linear-time scenarios. An implementation and evaluation over execution trace sets recorded from several real-world applications shows …


Automated Library Recommendation, Ferdian Thung, David Lo, Julia Lawall Oct 2013

Automated Library Recommendation, Ferdian Thung, David Lo, Julia Lawall

Research Collection School Of Computing and Information Systems

Many third party libraries are available to be downloaded and used. Using such libraries can reduce development time and make the developed software more reliable. However, developers are often unaware of suitable libraries to be used for their projects and thus they miss out on these benefits. To help developers better take advantage of the available libraries, we propose a new technique that automatically recommends libraries to developers. Our technique takes as input the set of libraries that an application currently uses, and recommends other libraries that are likely to be relevant. We follow a hybrid approach that combines association …


Modeling Interaction Features For Debate Side Clustering, Minghui Qiu, Liu Yang, Jing Jiang Oct 2013

Modeling Interaction Features For Debate Side Clustering, Minghui Qiu, Liu Yang, Jing Jiang

Research Collection School Of Computing and Information Systems

Online discussion forums are popular social media platforms for users to express their opinions and discuss controversial issues with each other. To automatically identify the sides/stances of posts or users from textual content in forums is an important task to help mine online opinions. To tackle the task, it is important to exploit user posts that implicitly contain support and dispute (interaction) information. The challenge we face is how to mine such interaction information from the content of posts and how to use them to help identify stances. This paper proposes a two-stage solution based on latent variable models: an …


Generative Models For Item Adoptions Using Social Correlation, Freddy Chong Tat Chua, Hady Wirawan Lauw, Ee Peng Lim Sep 2013

Generative Models For Item Adoptions Using Social Correlation, Freddy Chong Tat Chua, Hady Wirawan Lauw, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Users face many choices on the Web when it comes to choosing which product to buy, which video to watch, etc. In making adoption decisions, users rely not only on their own preferences, but also on friends. We call the latter social correlation which may be caused by the homophily and social influence effects. In this paper, we focus on modeling social correlation on users’ item adoptions. Given a user-user social graph and an item-user adoption graph, our research seeks to answer the following questions: whether the items adopted by a user correlate to items adopted by her friends, and …


Knowledge Extraction From Survey Data Using Neural Networks, Imran Ahmed Khan Jul 2013

Knowledge Extraction From Survey Data Using Neural Networks, Imran Ahmed Khan

Computer Science Theses

Surveys are an important tool for researchers. Survey attributes are typically discrete data measured on a Likert scale. Collected responses from the survey contain an enormous amount of data. It is increasingly important to develop powerful means for clustering such data and knowledge extraction that could help in decision-making. The process of clustering becomes complex if the number of survey attributes is large. Another major issue in Likert-Scale data is the uniqueness of tuples. A large number of unique tuples may result in a large number of patterns and that may increase the complexity of the knowledge extraction process. Also, …


Mining Sensor Datasets With Spatiotemporal Neighborhoods, Michael Patrick Mcguire, Vandana Janeja, Aryya Gangopadhyay Jun 2013

Mining Sensor Datasets With Spatiotemporal Neighborhoods, Michael Patrick Mcguire, Vandana Janeja, Aryya Gangopadhyay

Journal of Spatial Information Science

Many spatiotemporal data mining methods are dependent on how relationships between a spatiotemporal unit and its neighbors are defined. These relationships are often termed the neighborhood of a spatiotemporal object. The focus of this paper is the discovery of spatiotemporal neighborhoods to find automatically spatiotemporal sub-regions in a sensor dataset. This research is motivated by the need to characterize large sensor datasets like those found in oceanographic and meteorological research. The approach presented in this paper finds spatiotemporal neighborhoods in sensor datasets by combining an agglomerative method to create temporal intervals and a graph-based method to find spatial neighborhoods within …


The Rule Of Law In Cyberspace, Mireille Hildebrandt Jun 2013

The Rule Of Law In Cyberspace, Mireille Hildebrandt

Mireille Hildebrandt

This is a translation of my inaugural lecture at Radboud University Nijmegen. The Dutch version has been published as a booklet, the English version in available on my bepress site.


Localizing State-Dependent Faults Using Associated Sequence Mining, Shaimaa Ali May 2013

Localizing State-Dependent Faults Using Associated Sequence Mining, Shaimaa Ali

Electronic Thesis and Dissertation Repository

In this thesis we developed a new fault localization process to localize faults in object oriented software. The process is built upon the "Encapsulation'' principle and aims to locate state-dependent discrepancies in the software's behavior. We experimented with the proposed process on 50 seeded faults in 8 subject programs, and were able to locate the faulty class in 100% of the cases when objects with constant states were taken into consideration, while we missed 24% percent of the faults when these objects were not considered. We also developed a customized data mining technique "Associated sequence mining'' to be used in …


Disclosing Climate Change Patterns Using An Adaptive Markov Chain Pattern Detection Method, Zhaoxia Wang, Gary Lee, Hoong Maeng Chan, Reuben Li, Xiuju Fu, Rick Goh, Pauline A. W. Poh Kim, Martin L. Hibberd, Hoong Chor Chin May 2013

Disclosing Climate Change Patterns Using An Adaptive Markov Chain Pattern Detection Method, Zhaoxia Wang, Gary Lee, Hoong Maeng Chan, Reuben Li, Xiuju Fu, Rick Goh, Pauline A. W. Poh Kim, Martin L. Hibberd, Hoong Chor Chin

Research Collection School Of Computing and Information Systems

This paper proposes an adaptive Markov chain pattern detection (AMCPD) method for disclosing the climate change patterns of Singapore through meteorological data mining. Meteorological variables, including daily mean temperature, mean dew point temperature, mean visibility, mean wind speed, maximum sustained wind speed, maximum temperature and minimum temperature are simultaneously considered for identifying climate change patterns in this study. The results depict various weather patterns from 1962 to 2011 in Singapore, based on the records of the Changi Meteorological Station. Different scenarios with varied cluster thresholds are employed for testing the sensitivity of the proposed method. The robustness of the proposed …


Document Collection Visualization And Clustering Using An Atom Metaphor For Display And Interaction, Khanh V. Nghi May 2013

Document Collection Visualization And Clustering Using An Atom Metaphor For Display And Interaction, Khanh V. Nghi

Theses and Dissertations - UTB/UTPA

Visual Data Mining have proven to be of high value in exploratory data analysis and data mining because it provides an intuitive feedback on data analysis and support decision-making activities. Several visualization techniques have been developed for cluster discovery such as Grand Tour, HD-Eye, Star Coordinates, etc. They are very useful tool which are visualized in 2D or 3D; however, they have not simple for users who are not trained. This thesis proposes a new approach to build a 3D clustering visualization system for document clustering by using k-mean algorithm. A cluster will be represented by a neutron (centroid) and …


Data Near Here: Bringing Relevant Data Closer To Scientists, Veronika M. Megler, David Maier May 2013

Data Near Here: Bringing Relevant Data Closer To Scientists, Veronika M. Megler, David Maier

Computer Science Faculty Publications and Presentations

Large scientific repositories run the risk of losing value as their holdings expand, if it means increased effort for a scientist to locate particular datasets of interest. We discuss the challenges that scientists face in locating relevant data, and present our work in applying Information Retrieval techniques to dataset search, as embodied in the Data Near Here application.


Predicting Sql Injection And Cross Site Scripting Vulnerabilities Through Mining Input Sanitization Patterns, Lwin Khin Shar, Hee Beng Kuan Tan Apr 2013

Predicting Sql Injection And Cross Site Scripting Vulnerabilities Through Mining Input Sanitization Patterns, Lwin Khin Shar, Hee Beng Kuan Tan

Research Collection School Of Computing and Information Systems

ContextSQL injection (SQLI) and cross site scripting (XSS) are the two most common and serious web application vulnerabilities for the past decade. To mitigate these two security threats, many vulnerability detection approaches based on static and dynamic taint analysis techniques have been proposed. Alternatively, there are also vulnerability prediction approaches based on machine learning techniques, which showed that static code attributes such as code complexity measures are cheap and useful predictors. However, current prediction approaches target general vulnerabilities. And most of these approaches locate vulnerable code only at software component or file levels. Some approaches also involve process attributes that …


Data Mining The Functional Characterizations Of Proteins To Predict Their Cancer-Relatedness, Peter Revesz, Christopher Assi Feb 2013

Data Mining The Functional Characterizations Of Proteins To Predict Their Cancer-Relatedness, Peter Revesz, Christopher Assi

School of Computing: Faculty Publications

This paper considers two types of protein data. First, data about protein function described in a number of ways, such as, GO terms and PFAM families. Second, data about whether individual proteins are experimentally associated with cancer by an anomalous elevation or lowering of their expressions within cancerous cells. We combine these two types of protein data and test whether the first type of data, that is, the functional descriptors, can predict the second type of data, that is, cancer-relatedness. By using data mining and machine learning, we derive a classifier algorithm that using only GO term and PFAM family …


A Rule Induction Algorithm For Knowledge Discovery And Classification, Ömer Akgöbek Jan 2013

A Rule Induction Algorithm For Knowledge Discovery And Classification, Ömer Akgöbek

Turkish Journal of Electrical Engineering and Computer Sciences

Classification and rule induction are key topics in the fields of decision making and knowledge discovery. The objective of this study is to present a new algorithm developed for automatic knowledge acquisition in data mining. The proposed algorithm has been named RES-2 (Rule Extraction System). It aims at eliminating the pitfalls and disadvantages of the techniques and algorithms currently in use. The proposed algorithm makes use of the direct rule extraction approach, rather than the decision tree. For this purpose, it uses a set of examples to induce general rules. In this study, 15 datasets consisting of multiclass values with …


A Novel Computational Framework For Transcriptome Analysis With Rna-Seq Data, Yin Hu Jan 2013

A Novel Computational Framework For Transcriptome Analysis With Rna-Seq Data, Yin Hu

Theses and Dissertations--Computer Science

The advance of high-throughput sequencing technologies and their application on mRNA transcriptome sequencing (RNA-seq) have enabled comprehensive and unbiased profiling of the landscape of transcription in a cell. In order to address the current limitation of analyzing accuracy and scalability in transcriptome analysis, a novel computational framework has been developed on large-scale RNA-seq datasets with no dependence on transcript annotations. Directly from raw reads, a probabilistic approach is first applied to infer the best transcript fragment alignments from paired-end reads. Empowered by the identification of alternative splicing modules, this framework then performs precise and efficient differential analysis at automatically detected …


On Identifying Critical Nuggets Of Information During Classification Task, David Sathiaraj Jan 2013

On Identifying Critical Nuggets Of Information During Classification Task, David Sathiaraj

LSU Doctoral Dissertations

In large databases, there may exist critical nuggets - small collections of records or instances that contain domain-specific important information. This information can be used for future decision making such as labeling of critical, unlabeled data records and improving classification results by reducing false positive and false negative errors. In recent years, data mining efforts have focussed on pattern and outlier detection methods. However, not much effort has been dedicated to finding critical nuggets within a data set. This work introduces the idea of critical nuggets, proposes an innovative domain-independent method to measure criticality, suggests a heuristic to reduce the …


Exploring The Learnability Of Numeric Datasets, Di Lin Jan 2013

Exploring The Learnability Of Numeric Datasets, Di Lin

LSU Doctoral Dissertations

When doing classification, it has often been observed that datasets may exhibit different levels of difficulty with respect to how accurately they can be classified. That is, there are some datasets which can be classified very accurately by many classification algorithms, and there also exist some other datasets that no classifier can classify them with high accuracy. Based on this observation, we try to address the following problems: a)what are the factors that make a dataset easy or difficult to be accurately classified? b) how to use such factors to predict the difficulties of unclassified datasets? and c) how to …