Open Access. Powered by Scholars. Published by Universities.®
![Digital Commons Network](http://assets.bepress.com/20200205/img/dcn/DCsunburst.png)
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Computer Sciences (19)
- Software Engineering (5)
- Databases and Information Systems (4)
- Social and Behavioral Sciences (4)
- Statistics and Probability (4)
-
- Numerical Analysis and Scientific Computing (3)
- Applied Statistics (2)
- Biostatistics (2)
- Computer Engineering (2)
- Engineering (2)
- Statistical Models (2)
- Statistical Theory (2)
- Analytical, Diagnostic and Therapeutic Techniques and Equipment (1)
- Artificial Intelligence and Robotics (1)
- Bioinformatics (1)
- Communication (1)
- Computational Biology (1)
- Congenital, Hereditary, and Neonatal Diseases and Abnormalities (1)
- Data Storage Systems (1)
- Design of Experiments and Sample Surveys (1)
- Disease Modeling (1)
- Diseases (1)
- Electrical and Computer Engineering (1)
- Environmental Sciences (1)
- Genetics and Genomics (1)
- Genomics (1)
- Geographic Information Sciences (1)
- Geography (1)
- Health Information Technology (1)
- Institution
-
- Singapore Management University (7)
- Louisiana State University (2)
- SelectedWorks (2)
- COBRA (1)
- Portland State University (1)
-
- The University of Maine (1)
- The University of San Francisco (1)
- TÜBİTAK (1)
- University for Business and Technology in Kosovo (1)
- University of Kentucky (1)
- University of Nebraska - Lincoln (1)
- University of Texas Rio Grande Valley (1)
- University of Texas at Tyler (1)
- Wayne State University (1)
- Western University (1)
- Publication
-
- Research Collection School Of Computing and Information Systems (7)
- LSU Doctoral Dissertations (2)
- Computer Science Faculty Publications and Presentations (1)
- Computer Science Theses (1)
- Electronic Thesis and Dissertation Repository (1)
-
- Journal of Modern Applied Statistical Methods (1)
- Journal of Spatial Information Science (1)
- Master of Science in Analytics (MSAN) Faculty Research (1)
- Mireille Hildebrandt (1)
- Peter Austin (1)
- School of Computing: Faculty Publications (1)
- Theses and Dissertations - UTB/UTPA (1)
- Theses and Dissertations--Computer Science (1)
- Turkish Journal of Electrical Engineering and Computer Sciences (1)
- U.C. Berkeley Division of Biostatistics Working Paper Series (1)
- UBT International Conference (1)
- Publication Type
Articles 1 - 23 of 23
Full-Text Articles in Physical Sciences and Mathematics
Towards A Hybrid Framework For Detecting Input Manipulation Vulnerabilities, Sun Ding, Hee Beng Kuan Tan, Lwin Khin Shar, Bindu Madhavi Padmanabhuni
Towards A Hybrid Framework For Detecting Input Manipulation Vulnerabilities, Sun Ding, Hee Beng Kuan Tan, Lwin Khin Shar, Bindu Madhavi Padmanabhuni
Research Collection School Of Computing and Information Systems
Input manipulation vulnerabilities such as SQL Injection, Cross-site scripting, Buffer Overflow vulnerabilities are highly prevalent and pose critical security risks. As a result, many methods have been proposed to apply static analysis, dynamic analysis or a combination of them, to detect such security vulnerabilities. Most of the existing methods classify vulnerabilities into safe and unsafe. They have both false-positive and false-negative cases. In general, security vulnerability can be classified into three cases: (1) provable safe, (2) provable unsafe, (3) unsure. In this paper, we propose a hybrid framework-Detecting Input Manipulation Vulnerabilities (DIMV), to verify the adequacy of security vulnerability defenses …
Using Machine Learning Techniques To Customize The User's Profile, Helps Intelligent Tv Decoder’S Design, Alketa Hyso, Roneda Mucaj
Using Machine Learning Techniques To Customize The User's Profile, Helps Intelligent Tv Decoder’S Design, Alketa Hyso, Roneda Mucaj
UBT International Conference
In today's society due to the increase of the quantity of information is becoming more difficult to find the information we search. "Data mining" offers us the most important methods and techniques in data analysis. Through this work, we aim to study the several data mining techniques, methods and applications in specific areas. We experiment with an “open software" WEKA, to perform some data analysis, presenting the reliability and advantages of data mining classification technique. We use the decision trees technique to achieve the task of classification, to customize user profiles based on their requirements and needs. This paper presents …
Mining Branching-Time Scenarios, Dirk Fahland, David Lo, Shahar Maoz
Mining Branching-Time Scenarios, Dirk Fahland, David Lo, Shahar Maoz
Research Collection School Of Computing and Information Systems
Specification mining extracts candidate specification from existing systems, to be used for downstream tasks such as testing and verification. Specifically, we are interested in the extraction of behavior models from execution traces. In this paper we introduce mining of branching-time scenarios in the form of existential, conditional Live Sequence Charts, using a statistical data-mining algorithm. We show the power of branching scenarios to reveal alternative scenario-based behaviors, which could not be mined by previous approaches. The work contrasts and complements previous works on mining linear-time scenarios. An implementation and evaluation over execution trace sets recorded from several real-world applications shows …
Modeling Interaction Features For Debate Side Clustering, Minghui Qiu, Liu Yang, Jing Jiang
Modeling Interaction Features For Debate Side Clustering, Minghui Qiu, Liu Yang, Jing Jiang
Research Collection School Of Computing and Information Systems
Online discussion forums are popular social media platforms for users to express their opinions and discuss controversial issues with each other. To automatically identify the sides/stances of posts or users from textual content in forums is an important task to help mine online opinions. To tackle the task, it is important to exploit user posts that implicitly contain support and dispute (interaction) information. The challenge we face is how to mine such interaction information from the content of posts and how to use them to help identify stances. This paper proposes a two-stage solution based on latent variable models: an …
Automated Library Recommendation, Ferdian Thung, David Lo, Julia Lawall
Automated Library Recommendation, Ferdian Thung, David Lo, Julia Lawall
Research Collection School Of Computing and Information Systems
Many third party libraries are available to be downloaded and used. Using such libraries can reduce development time and make the developed software more reliable. However, developers are often unaware of suitable libraries to be used for their projects and thus they miss out on these benefits. To help developers better take advantage of the available libraries, we propose a new technique that automatically recommends libraries to developers. Our technique takes as input the set of libraries that an application currently uses, and recommends other libraries that are likely to be relevant. We follow a hybrid approach that combines association …
Generative Models For Item Adoptions Using Social Correlation, Freddy Chong Tat Chua, Hady Wirawan Lauw, Ee Peng Lim
Generative Models For Item Adoptions Using Social Correlation, Freddy Chong Tat Chua, Hady Wirawan Lauw, Ee Peng Lim
Research Collection School Of Computing and Information Systems
Users face many choices on the Web when it comes to choosing which product to buy, which video to watch, etc. In making adoption decisions, users rely not only on their own preferences, but also on friends. We call the latter social correlation which may be caused by the homophily and social influence effects. In this paper, we focus on modeling social correlation on users’ item adoptions. Given a user-user social graph and an item-user adoption graph, our research seeks to answer the following questions: whether the items adopted by a user correlate to items adopted by her friends, and …
Knowledge Extraction From Survey Data Using Neural Networks, Imran Ahmed Khan
Knowledge Extraction From Survey Data Using Neural Networks, Imran Ahmed Khan
Computer Science Theses
Surveys are an important tool for researchers. Survey attributes are typically discrete data measured on a Likert scale. Collected responses from the survey contain an enormous amount of data. It is increasingly important to develop powerful means for clustering such data and knowledge extraction that could help in decision-making. The process of clustering becomes complex if the number of survey attributes is large. Another major issue in Likert-Scale data is the uniqueness of tuples. A large number of unique tuples may result in a large number of patterns and that may increase the complexity of the knowledge extraction process. Also, …
Statistical Inference For Data Adaptive Target Parameters, Mark J. Van Der Laan, Alan E. Hubbard, Sara Kherad Pajouh
Statistical Inference For Data Adaptive Target Parameters, Mark J. Van Der Laan, Alan E. Hubbard, Sara Kherad Pajouh
U.C. Berkeley Division of Biostatistics Working Paper Series
Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in estimation-sample (one of the V subsamples) and corresponding complementary parameter-generating sample that is used to generate a target parameter. For each of the V parameter-generating samples, we apply an algorithm that maps the sample in a target parameter mapping which represent the statistical target parameter generated by that parameter-generating …
Mining Sensor Datasets With Spatiotemporal Neighborhoods, Michael Patrick Mcguire, Vandana Janeja, Aryya Gangopadhyay
Mining Sensor Datasets With Spatiotemporal Neighborhoods, Michael Patrick Mcguire, Vandana Janeja, Aryya Gangopadhyay
Journal of Spatial Information Science
Many spatiotemporal data mining methods are dependent on how relationships between a spatiotemporal unit and its neighbors are defined. These relationships are often termed the neighborhood of a spatiotemporal object. The focus of this paper is the discovery of spatiotemporal neighborhoods to find automatically spatiotemporal sub-regions in a sensor dataset. This research is motivated by the need to characterize large sensor datasets like those found in oceanographic and meteorological research. The approach presented in this paper finds spatiotemporal neighborhoods in sensor datasets by combining an agglomerative method to create temporal intervals and a graph-based method to find spatial neighborhoods within …
The Rule Of Law In Cyberspace, Mireille Hildebrandt
The Rule Of Law In Cyberspace, Mireille Hildebrandt
Mireille Hildebrandt
This is a translation of my inaugural lecture at Radboud University Nijmegen. The Dutch version has been published as a booklet, the English version in available on my bepress site.
Localizing State-Dependent Faults Using Associated Sequence Mining, Shaimaa Ali
Localizing State-Dependent Faults Using Associated Sequence Mining, Shaimaa Ali
Electronic Thesis and Dissertation Repository
In this thesis we developed a new fault localization process to localize faults in object oriented software. The process is built upon the "Encapsulation'' principle and aims to locate state-dependent discrepancies in the software's behavior. We experimented with the proposed process on 50 seeded faults in 8 subject programs, and were able to locate the faulty class in 100% of the cases when objects with constant states were taken into consideration, while we missed 24% percent of the faults when these objects were not considered. We also developed a customized data mining technique "Associated sequence mining'' to be used in …
The Probit Link Function In Generalized Linear Models For Data Mining Applications, Mehdi Razzaghi
The Probit Link Function In Generalized Linear Models For Data Mining Applications, Mehdi Razzaghi
Journal of Modern Applied Statistical Methods
The use of logistic regression for outcome classification of dichotomous variables is well known in data mining applications. The estimated probability of the logit transformation belongs to the class of canonical link functions that follow from particular probability distribution functions. A closely related model is the probit link which can be used for binary responses. Although the probit link is not canonical, in some cases the overall fit of the model can be improved by using non-canonical link functions. This article reviews the properties of the probit link function and discusses its applications in data mining problems. Contrasts and comparisons …
Disclosing Climate Change Patterns Using An Adaptive Markov Chain Pattern Detection Method, Zhaoxia Wang, Gary Lee, Hoong Maeng Chan, Reuben Li, Xiuju Fu, Rick Goh, Pauline A. W. Poh Kim, Martin L. Hibberd, Hoong Chor Chin
Disclosing Climate Change Patterns Using An Adaptive Markov Chain Pattern Detection Method, Zhaoxia Wang, Gary Lee, Hoong Maeng Chan, Reuben Li, Xiuju Fu, Rick Goh, Pauline A. W. Poh Kim, Martin L. Hibberd, Hoong Chor Chin
Research Collection School Of Computing and Information Systems
This paper proposes an adaptive Markov chain pattern detection (AMCPD) method for disclosing the climate change patterns of Singapore through meteorological data mining. Meteorological variables, including daily mean temperature, mean dew point temperature, mean visibility, mean wind speed, maximum sustained wind speed, maximum temperature and minimum temperature are simultaneously considered for identifying climate change patterns in this study. The results depict various weather patterns from 1962 to 2011 in Singapore, based on the records of the Changi Meteorological Station. Different scenarios with varied cluster thresholds are employed for testing the sensitivity of the proposed method. The robustness of the proposed …
Data Near Here: Bringing Relevant Data Closer To Scientists, Veronika M. Megler, David Maier
Data Near Here: Bringing Relevant Data Closer To Scientists, Veronika M. Megler, David Maier
Computer Science Faculty Publications and Presentations
Large scientific repositories run the risk of losing value as their holdings expand, if it means increased effort for a scientist to locate particular datasets of interest. We discuss the challenges that scientists face in locating relevant data, and present our work in applying Information Retrieval techniques to dataset search, as embodied in the Data Near Here application.
Document Collection Visualization And Clustering Using An Atom Metaphor For Display And Interaction, Khanh V. Nghi
Document Collection Visualization And Clustering Using An Atom Metaphor For Display And Interaction, Khanh V. Nghi
Theses and Dissertations - UTB/UTPA
Visual Data Mining have proven to be of high value in exploratory data analysis and data mining because it provides an intuitive feedback on data analysis and support decision-making activities. Several visualization techniques have been developed for cluster discovery such as Grand Tour, HD-Eye, Star Coordinates, etc. They are very useful tool which are visualized in 2D or 3D; however, they have not simple for users who are not trained. This thesis proposes a new approach to build a 3D clustering visualization system for document clustering by using k-mean algorithm. A cluster will be represented by a neutron (centroid) and …
Predicting Sql Injection And Cross Site Scripting Vulnerabilities Through Mining Input Sanitization Patterns, Lwin Khin Shar, Hee Beng Kuan Tan
Predicting Sql Injection And Cross Site Scripting Vulnerabilities Through Mining Input Sanitization Patterns, Lwin Khin Shar, Hee Beng Kuan Tan
Research Collection School Of Computing and Information Systems
ContextSQL injection (SQLI) and cross site scripting (XSS) are the two most common and serious web application vulnerabilities for the past decade. To mitigate these two security threats, many vulnerability detection approaches based on static and dynamic taint analysis techniques have been proposed. Alternatively, there are also vulnerability prediction approaches based on machine learning techniques, which showed that static code attributes such as code complexity measures are cheap and useful predictors. However, current prediction approaches target general vulnerabilities. And most of these approaches locate vulnerable code only at software component or file levels. Some approaches also involve process attributes that …
Data Mining The Functional Characterizations Of Proteins To Predict Their Cancer-Relatedness, Peter Revesz, Christopher Assi
Data Mining The Functional Characterizations Of Proteins To Predict Their Cancer-Relatedness, Peter Revesz, Christopher Assi
School of Computing: Faculty Publications
This paper considers two types of protein data. First, data about protein function described in a number of ways, such as, GO terms and PFAM families. Second, data about whether individual proteins are experimentally associated with cancer by an anomalous elevation or lowering of their expressions within cancerous cells. We combine these two types of protein data and test whether the first type of data, that is, the functional descriptors, can predict the second type of data, that is, cancer-relatedness. By using data mining and machine learning, we derive a classifier algorithm that using only GO term and PFAM family …
Catching A Viral Video, T Broxton, Yannet Interian, J Vaver, M Wattenhofer
Catching A Viral Video, T Broxton, Yannet Interian, J Vaver, M Wattenhofer
Master of Science in Analytics (MSAN) Faculty Research
The sharing and re-sharing of videos on social sites, blogs e-mail, and other means has given rise to the phenomenon of viral videos - videos that become popular through internet sharing. In this paper we seek to better understand viral videos on YouTube by analyzing sharing and its relationship to video popularity using millions of YouTube videos. The socialness of a video is quantified by classifying the referrer sources for video views as social (e.g. an emailed link, Facebook referral) or non-social (e.g. a link from related videos). We find that viewership patterns of highly social videos are very different …
Using Methods From The Data-Mining And Machine-Learning Literature For Disease Classification And Prediction: A Case Study Examining Classification Of Heart Failure Subtypes, Peter C. Austin
Peter Austin
OBJECTIVE: Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine-learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines.
STUDY DESIGN AND SETTING: We compared the performance of these classification methods with that of conventional classification trees to classify patients with heart failure (HF) …
A Rule Induction Algorithm For Knowledge Discovery And Classification, Ömer Akgöbek
A Rule Induction Algorithm For Knowledge Discovery And Classification, Ömer Akgöbek
Turkish Journal of Electrical Engineering and Computer Sciences
Classification and rule induction are key topics in the fields of decision making and knowledge discovery. The objective of this study is to present a new algorithm developed for automatic knowledge acquisition in data mining. The proposed algorithm has been named RES-2 (Rule Extraction System). It aims at eliminating the pitfalls and disadvantages of the techniques and algorithms currently in use. The proposed algorithm makes use of the direct rule extraction approach, rather than the decision tree. For this purpose, it uses a set of examples to induce general rules. In this study, 15 datasets consisting of multiclass values with …
On Identifying Critical Nuggets Of Information During Classification Task, David Sathiaraj
On Identifying Critical Nuggets Of Information During Classification Task, David Sathiaraj
LSU Doctoral Dissertations
In large databases, there may exist critical nuggets - small collections of records or instances that contain domain-specific important information. This information can be used for future decision making such as labeling of critical, unlabeled data records and improving classification results by reducing false positive and false negative errors. In recent years, data mining efforts have focussed on pattern and outlier detection methods. However, not much effort has been dedicated to finding critical nuggets within a data set. This work introduces the idea of critical nuggets, proposes an innovative domain-independent method to measure criticality, suggests a heuristic to reduce the …
A Novel Computational Framework For Transcriptome Analysis With Rna-Seq Data, Yin Hu
A Novel Computational Framework For Transcriptome Analysis With Rna-Seq Data, Yin Hu
Theses and Dissertations--Computer Science
The advance of high-throughput sequencing technologies and their application on mRNA transcriptome sequencing (RNA-seq) have enabled comprehensive and unbiased profiling of the landscape of transcription in a cell. In order to address the current limitation of analyzing accuracy and scalability in transcriptome analysis, a novel computational framework has been developed on large-scale RNA-seq datasets with no dependence on transcript annotations. Directly from raw reads, a probabilistic approach is first applied to infer the best transcript fragment alignments from paired-end reads. Empowered by the identification of alternative splicing modules, this framework then performs precise and efficient differential analysis at automatically detected …
Exploring The Learnability Of Numeric Datasets, Di Lin
Exploring The Learnability Of Numeric Datasets, Di Lin
LSU Doctoral Dissertations
When doing classification, it has often been observed that datasets may exhibit different levels of difficulty with respect to how accurately they can be classified. That is, there are some datasets which can be classified very accurately by many classification algorithms, and there also exist some other datasets that no classifier can classify them with high accuracy. Based on this observation, we try to address the following problems: a)what are the factors that make a dataset easy or difficult to be accurately classified? b) how to use such factors to predict the difficulties of unclassified datasets? and c) how to …