Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Databases and Information Systems (5)
- Engineering (3)
- Computer Engineering (2)
- Information Security (2)
- Medical Specialties (2)
-
- Medicine and Health Sciences (2)
- Bioinformatics (1)
- Business (1)
- Chemistry (1)
- Computational Engineering (1)
- Geographic Information Sciences (1)
- Geography (1)
- Health Information Technology (1)
- Industrial Engineering (1)
- Life Sciences (1)
- Management Information Systems (1)
- Numerical Analysis and Scientific Computing (1)
- Oncology (1)
- Operations Research, Systems Engineering and Industrial Engineering (1)
- Organic Chemistry (1)
- Pediatrics (1)
- Preventive Medicine (1)
- Social and Behavioral Sciences (1)
- Software Engineering (1)
- Systems Architecture (1)
- Institution
-
- Selected Works (4)
- University of Nebraska - Lincoln (3)
- Brigham Young University (2)
- Singapore Management University (2)
- Chapman University (1)
-
- Edith Cowan University (1)
- Kennesaw State University (1)
- Louisiana Tech University (1)
- New Jersey Institute of Technology (1)
- Old Dominion University (1)
- The University of Maine (1)
- University of Kentucky (1)
- University of Texas at El Paso (1)
- Wayne State University (1)
- Western Kentucky University (1)
- Publication
-
- Leisa Armstrong (4)
- Research Collection School Of Computing and Information Systems (2)
- Theses and Dissertations (2)
- Australian Information Security Management Conference (1)
- Business Faculty Articles and Research (1)
-
- CSE Conference and Workshop Papers (1)
- Computational Modeling & Simulation Engineering Theses & Dissertations (1)
- Department of Computer Science and Engineering: Dissertations, Theses, and Student Research (1)
- Doctoral Dissertations (1)
- Faculty and Research Publications (1)
- Journal of Spatial Information Science (1)
- Masters Theses & Specialist Projects (1)
- Open Access Theses & Dissertations (1)
- School of Computing: Faculty Publications (1)
- Theses (1)
- Theses and Dissertations--Computer Science (1)
- Wayne State University Theses (1)
- Publication Type
Articles 1 - 22 of 22
Full-Text Articles in Computer Sciences
Human-Readable Real-Time Classifications Of Malicious Executables, Anselm Teh, Arran Stewart
Human-Readable Real-Time Classifications Of Malicious Executables, Anselm Teh, Arran Stewart
Australian Information Security Management Conference
Shafiq et al. (2009a) propose a non–signature-based technique for detecting malware which applies data mining techniques to features extracted from executable files. Their technique has a high level of accuracy, a low false positive rate, and a speed on par with commercial anti-virus products. One portion of their technique uses a multi-layer perceptron as a classifier, which provides little insight into the reasons for classification. Our experience is that network security analysts prefer tools which provide human-comprehensible reasons for a classification, rather than operating as “black boxes”. We therefore build on the results of Shafiq et al. by demonstrating a …
Data Mining Of Pancreatic Cancer Protein Databases, Peter Revesz, Christopher Assi
Data Mining Of Pancreatic Cancer Protein Databases, Peter Revesz, Christopher Assi
CSE Conference and Workshop Papers
Data mining of protein databases poses special challenges because many protein databases are non- relational whereas most data mining and machine learning algorithms assume the input data to be a type of rela- tional database that is also representable as an ARFF file. We developed a method to restructure protein databases so that they become amenable for various data mining and machine learning tools. Our restructuring method en- abled us to apply both decision tree and support vector machine classifiers to a pancreatic protein database. The SVM classifier that used both GO term and PFAM families to characterize proteins gave …
Exploring Place Through User-Generated Content: Using Flickr Tags To Describe City Cores, Livia Hollenstein, Ross Purves
Exploring Place Through User-Generated Content: Using Flickr Tags To Describe City Cores, Livia Hollenstein, Ross Purves
Journal of Spatial Information Science
Terms used to describe city centers, such as Downtown, are key concepts in everyday or vernacular language. Here, we explore such language by harvesting georeferenced and tagged metadata associated with 8 million Flickr images and thus consider how large numbers of people name city core areas. The nature of errors and imprecision in tagging and georeferencing are quantified, and automatically generated precision measures appear to mirror errors in the positioning of images. Users seek to ascribe appropriate semantics to images, though bulk-uploading and bulk-tagging may introduce bias. Between 0.5--2% of tags associated with georeferenced images analyzed describe city core areas …
Adaptive Grid Based Localized Learning For Multidimensional Data, Sheetal Saini
Adaptive Grid Based Localized Learning For Multidimensional Data, Sheetal Saini
Doctoral Dissertations
Rapid advances in data-rich domains of science, technology, and business has amplified the computational challenges of "Big Data" synthesis necessary to slow the widening gap between the rate at which the data is being collected and analyzed for knowledge. This has led to the renewed need for efficient and accurate algorithms, framework, and algorithmic mechanisms essential for knowledge discovery, especially in the domains of clustering, classification, dimensionality reduction, feature ranking, and feature selection. However, data mining algorithms are frequently challenged by the sparseness due to the high dimensionality of the datasets in such domains which is particularly detrimental to the …
Semi-Automatic Simulation Initialization By Mining Structured And Unstructured Data Formats From Local And Web Data Sources, Olcay Sahin
Computational Modeling & Simulation Engineering Theses & Dissertations
Initialization is one of the most important processes for obtaining successful results from a simulation. However, initialization is a challenge when 1) a simulation requires hundreds or even thousands of input parameters or 2) re-initializing the simulation due to different initial conditions or runtime errors. These challenges lead to the modeler spending more time initializing a simulation and may lead to errors due to poor input data.
This thesis proposes two semi-automatic simulation initialization approaches that provide initialization using data mining from structured and unstructured data formats from local and web data sources. First, the System Initialization with Retrieval (SIR) …
Building A Computer Program To Support Children, Parents, And Distraction During Healthcare Procedures, Kirsten Hanrahan, Ann Marie Mccarthy, Charmaine Kleiber, Kaan Ataman, W. Nick Street, M. Bridget Zimmerman, Annel L. Ersig
Building A Computer Program To Support Children, Parents, And Distraction During Healthcare Procedures, Kirsten Hanrahan, Ann Marie Mccarthy, Charmaine Kleiber, Kaan Ataman, W. Nick Street, M. Bridget Zimmerman, Annel L. Ersig
Business Faculty Articles and Research
This secondary data analysis used data mining methods to develop predictive models of child risk for distress during a healthcare procedure. Data used came from a study that predicted factors associated with children's responses to an intravenous catheter insertion while parents provided distraction coaching. From the 255 items used in the primary study, 44 predictive items were identified through automatic feature selection and used to build support vector machine regression models. Models were validated using multiple cross-validation tests and by comparing variables identified as explanatory in the traditional versus support vector machine regression. Rule-based approaches were applied to the model …
A Confidence-Prioritization Approach To Data Processing In Noisy Data Sets And Resulting Estimation Models For Predicting Streamflow Diel Signals In The Pacific Northwest, Nathaniel Lee Gustafson
A Confidence-Prioritization Approach To Data Processing In Noisy Data Sets And Resulting Estimation Models For Predicting Streamflow Diel Signals In The Pacific Northwest, Nathaniel Lee Gustafson
Theses and Dissertations
Streams in small watersheds are often known to exhibit diel fluctuations, in which streamflow oscillates on a 24-hour cycle. Streamflow diel fluctuations, which we investigate in this study, are an informative indicator of environmental processes. However, in Environmental Data sets, as well as many others, there is a range of noise associated with individual data points. Some points are extracted under relatively clear and defined conditions, while others may include a range of known or unknown confounding factors, which may decrease those points' validity. These points may or may not remain useful for training, depending on how much uncertainty they …
From Clickstreams To Searchstreams: Search Network Graph Evidence From A B2b E-Market, Mei Lin, M. F. Lin, Robert J. Kauffman
From Clickstreams To Searchstreams: Search Network Graph Evidence From A B2b E-Market, Mei Lin, M. F. Lin, Robert J. Kauffman
Research Collection School Of Computing and Information Systems
Consumers in e-commerce acquire information through search engines, yet to date there has been little empirical study on how users interact with the results produced by search engines. This is analogous to, but different from, the ever-expanding research on clickstreams, where users interact with static web pages. We propose a new network approach to analyzing search engine server log data. We call this searchstream data. We create graph representations based on the web pages that users traverse as they explore the search results that their use of search engines generates. We then analyze the graph-level properties of these search network …
Data Mining Of Protein Databases, Christopher Assi
Data Mining Of Protein Databases, Christopher Assi
Department of Computer Science and Engineering: Dissertations, Theses, and Student Research
Data mining of protein databases poses special challenges because many protein databases are non-relational whereas most data mining and machine learning algorithms assume the input data to be a relational database. Protein databases are non-relational mainly because they often contain set data types. We developed new data mining algorithms that can restructure non-relational protein databases so that they become relational and amenable for various data mining and machine learning tools. We applied the new restructuring algorithms to a pancreatic protein database. After the restructuring, we also applied two classification methods, such as decision tree and SVM classifiers and compared their …
Mining Input Sanitization Patterns For Predicting Sql Injection And Cross Site Scripting Vulnerabilities, Lwin Khin Shar, Hee Beng Kuan Tan
Mining Input Sanitization Patterns For Predicting Sql Injection And Cross Site Scripting Vulnerabilities, Lwin Khin Shar, Hee Beng Kuan Tan
Research Collection School Of Computing and Information Systems
Static code attributes such as lines of code and cyclomatic complexity have been shown to be useful indicators of defects in software modules. As web applications adopt input sanitization routines to prevent web security risks, static code attributes that represent the characteristics of these routines may be useful for predicting web application vulnerabilities. In this paper, we classify various input sanitization methods into different types and propose a set of static code attributes that represent these types. Then we use data mining methods to predict SQL injection and cross site scripting vulnerabilities in web applications. Preliminary experiments show that our …
Data Mining Of Tetraloop-Tetraloop Receptors In Rna Xml Files, Sinan Ramazanoglu
Data Mining Of Tetraloop-Tetraloop Receptors In Rna Xml Files, Sinan Ramazanoglu
Theses
RNA (Ribonucleic acid) Motifs are tertiary structures that play an important role in the folding mechanism of the RNA molecule. The overall function of a RNA Motif depends on its specific bp (base pairs) sequence that constitutes the secondary structure. Data mining is a novel method in both discovering potential tertiary structures within DNA (Deoxyribonucleic acid), RNA, and protein molecules and storing the information in databases. The RNA Motif of interest is the tetraloop-tetraloop receptor, which is composed of a highly conserved 11 nt (nucleotide) sequence and a tetraloop with the generic form of GNRA (where N = any base …
Ensemble Of Feature Selection Techniques For High Dimensional Data, Sri Harsha Vege
Ensemble Of Feature Selection Techniques For High Dimensional Data, Sri Harsha Vege
Masters Theses & Specialist Projects
Data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships from large amounts of data stored in databases, data warehouses, or other information repositories. Feature selection is an important preprocessing step of data mining that helps increase the predictive performance of a model. The main aim of feature selection is to choose a subset of features with high predictive information and eliminate irrelevant features with little or no predictive information. Using a single feature selection technique may generate local optima.
In this thesis we propose an ensemble approach for feature selection, where multiple …
Analysis And Characterization Of Author Contribution Patterns In Open Source Software Development, Quinn Carlson Taylor
Analysis And Characterization Of Author Contribution Patterns In Open Source Software Development, Quinn Carlson Taylor
Theses and Dissertations
Software development is a process fraught with unpredictability, in part because software is created by people. Human interactions add complexity to development processes, and collaborative development can become a liability if not properly understood and managed. Recent years have seen an increase in the use of data mining techniques on publicly-available repository data with the goal of improving software development processes, and by extension, software quality. In this thesis, we introduce the concept of author entropy as a metric for quantifying interaction and collaboration (both within individual files and across projects), present results from two empirical observational studies of open-source …
Applying Data Mining Techniques In The Selection Of Plant Traits, Dean Diepeveen, Leisa Armstrong
Applying Data Mining Techniques In The Selection Of Plant Traits, Dean Diepeveen, Leisa Armstrong
Leisa Armstrong
In the agricultural sector, farmers are provided with crop related information by various research agencies in order to make critical decisions about which is the most profitable crop variety choice. Research agencies provide information which is generic, rather than being tailored to the individual farmers cropping situation. A number of specific plant and growth traits are used to establish the most suitable crop varieties. When selecting crop varieties for release to growers, the application of data mining techniques to crop research data enables the customization of information to each individual farmers farming situation. The challenge for agricultural research perspective is …
An Evaluation Of Methodologies For Eagriculture In An Australian Context, Leisa Armstrong, Dean Diepeveen
An Evaluation Of Methodologies For Eagriculture In An Australian Context, Leisa Armstrong, Dean Diepeveen
Leisa Armstrong
Australian agricultural producers’ profits are dependent on the decisions they make about farm productivity systems. They may use recommendations and information provided by government agencies and private consultants. For cereal growers, success is dependent on decisions made about selection of crop varieties suitable for their agronomic and climatic conditions. This paper reports on research which aimed to evaluate some current eAgriculture methodologies for their application in the Western Australian agricultural industry. In particular the paper illustrates the findings from a project which aimed to explain the variability seen in crop varieties grown in Western Australia. The problems associated with crop …
An Eagriculture-Based Decision Support Framework For Information Dissemination, Leisa Armstrong, Dean Diepeveen, Khumphicha Tantisantisom
An Eagriculture-Based Decision Support Framework For Information Dissemination, Leisa Armstrong, Dean Diepeveen, Khumphicha Tantisantisom
Leisa Armstrong
The ability of farmers to acquire knowledge to make decisions is limited by the information quality and applicability. Inconsistencies information delivery and standards/or the integration o/information also limit decision making processes. This research uses a similar approach to the Knowledge Discovery in Databases (KDD) methodology to develop an ICT based framework which can be used to facilitate the acquisition of knowledge for farmer's' decision making processes. This is one of the leading areas of research and development for information technology in an agricultural industry, which is yet to utilize such technologies fully. The Farmer Knowledge and Decision Support Framework (FKDSF) …
An Information-Based Decision Support Framework For Eagriculture, Leisa Armstrong, Dean Diepeveen
An Information-Based Decision Support Framework For Eagriculture, Leisa Armstrong, Dean Diepeveen
Leisa Armstrong
The ability of farmers to acquire knowledge to make decisions is limited by the information quality and applicability. An inconsistency in information delivery and standards for the integration of information also limits the decision making process. Knowledge Discovery in Databases (KDD) methodology described for the data mining is an example of how frameworks can be used to facilitate such data integration. This research will examine how such a ICT based framework can be used to facilitate the acquisition of knowledge for the farmer decision making process. The Farmer Knowledge and Decision Support Framework (FKDSF) takes information provided to farmers and …
Computer Methods For Pre-Microrna Secondary Structure Prediction, Dianwei Han
Computer Methods For Pre-Microrna Secondary Structure Prediction, Dianwei Han
Theses and Dissertations--Computer Science
This thesis presents a new algorithm to predict the pre-microRNA secondary structure. An accurate prediction of the pre-microRNA secondary structure is important in miRNA informatics. Based on a recently proposed model, nucleotide cyclic motifs (NCM), to predict RNA secondary structure, we propose and implement a Modified NCM (MNCM) model with a physics-based scoring strategy to tackle the problem of pre-microRNA folding. Our microRNAfold is implemented using a global optimal algorithm based on the bottom-up local optimal solutions.
It has been shown that studying the functions of multiple genes and predicting the secondary structure of multiple related microRNA is more important …
Medical Data Analysis Method For Epilepsy, Ameen Eetemadi
Medical Data Analysis Method For Epilepsy, Ameen Eetemadi
Wayne State University Theses
Applying data mining techniques on medical databases which contain un-structured and semi-structured data is a challenging task. It is not only due to the complexity of such databases but also due to the characteristics of the medical domain. This thesis describes how multiple layers of data mining techniques have been applied to a Human Brain Image Database system. It starts with data preparation which paves the way for conventional data analysis techniques to be applied to the data. A similarity based patient retrieval tool has been designed and developed to assist in treatment planning and outcome estimation for epileptic patients. …
Decision Rule Induction For Service Sector Using Data Mining- A Rough Set Theory Approach, Zhonghua Hu
Decision Rule Induction For Service Sector Using Data Mining- A Rough Set Theory Approach, Zhonghua Hu
Open Access Theses & Dissertations
Nowadays, data mining is more widely used than ever before; not only by the academic area, but also in the industry and business area. Apart from execution of business processes, the creation of knowledge base and its utilization for the benefit of the organization is becoming a strategy tool to compete. Despite of having ever growing data bases, the problem is that the finance company fails to fully capitalize the true benefits which can be gained from this great wealth of information. The data mining technology instead of classic statistical analysis is developed to help the people to discover the …
Redistricting Using Constrained Polygonal Clustering, Deepti Joshi, Leen-Kiat Soh, Ashok Samal
Redistricting Using Constrained Polygonal Clustering, Deepti Joshi, Leen-Kiat Soh, Ashok Samal
School of Computing: Faculty Publications
Redistricting is the process of dividing a geographic area consisting of spatial units—often represented as spatial polygons—into smaller districts that satisfy some properties. It can therefore be formulated as a set partitioning problem where the objective is to cluster the set of spatial polygons into groups such that a value function is maximized [1]. Widely used algorithms developed for point-based data sets are not readily applicable because polygons introduce the concepts of spatial contiguity and other topological properties that cannot be captured by representing polygons as points. Furthermore, when clustering polygons, constraints such as spatial contiguity and unit distributedness should …
Hypotheses Generation As Supervised Link Discovery With Automated Class Labeling On Large-Scale Biomedical Concept Networks, Jayasimha R. Katukuri, Ying Xie, Vijay Raghavan, Ashish Gupta
Hypotheses Generation As Supervised Link Discovery With Automated Class Labeling On Large-Scale Biomedical Concept Networks, Jayasimha R. Katukuri, Ying Xie, Vijay Raghavan, Ashish Gupta
Faculty and Research Publications
Computational approaches to generate hypotheses from biomedical literature have been studied intensively in recent years. Nevertheless, it still remains a challenge to automatically discover novel, cross-silo biomedical hypotheses from large-scale literature repositories. In order to address this challenge, we first model a biomedical literature repository as a comprehensive network of biomedical concepts and formulate hypotheses generation as a process of link discovery on the concept network. We extract the relevant information from the biomedical literature corpus and generate a concept network and concept-author map on a cluster using Map-Reduce framework. We extract a set of heterogeneous features such as random …