Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 13 of 13

Full-Text Articles in Databases and Information Systems

A Visual Approach To Automated Text Mining And Knowledge Discovery, Andrey A. Puretskiy Dec 2010

A Visual Approach To Automated Text Mining And Knowledge Discovery, Andrey A. Puretskiy

Doctoral Dissertations

The focus of this dissertation has been on improving the non-negative tensor factorization technique of text mining. The improvements have been made in both pre-processing and post-processing stages, with the goal of making the non-negative tensor factorization algorithm accessible to the casual user. The improved implementation allows the user to construct and modify the contents of the tensor, experiment with relative term weights and trust measures, and experiment with the total number of algorithm output features. Non-negative tensor factorization output feature production is closely integrated with a visual post-processing tool, FutureLens, that allows the user to perform in depth analysis …


Program Transformations For Information Personalization, Saverio Perugini, Naren Ramakrishnan Oct 2010

Program Transformations For Information Personalization, Saverio Perugini, Naren Ramakrishnan

Computer Science Faculty Publications

Personalization constitutes the mechanisms necessary to automatically customize information content, structure, and presentation to the end user to reduce information overload. Unlike traditional approaches to personalization, the central theme of our approach is to model a website as a program and conduct website transformation for personalization by program transformation (e.g., partial evaluation, program slicing). The goal of this paper is study personalization through a program transformation lens and develop a formal model, based on program transformations, for personalized interaction with hierarchical hypermedia. The specific research issues addressed involve identifying and developing program representations and transformations suitable for classes of hierarchical …


A Comparative Study Of Threshold-Based Feature Selection Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Jason Van Hulse Aug 2010

A Comparative Study Of Threshold-Based Feature Selection Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Jason Van Hulse

Dr. Huanjing Wang

Abstract Given high-dimensional software measurement data, researchers and practitioners often use feature (metric) selection techniques to improve the performance of software quality classification models. This paper presents our newly proposed threshold-based feature selection techniques, comparing the performance of these techniques by building classification models using five commonly used classifiers. In order to evaluate the effectiveness of different feature selection techniques, the models are evaluated using eight different performance metrics separately since a given performance metric usually captures only one aspect of the classification performance. All experiments are conducted on three Eclipse data sets with different levels of class imbalance. The …


A Comparative Study Of Filter-Based Feature Ranking Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao Aug 2010

A Comparative Study Of Filter-Based Feature Ranking Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao

Dr. Huanjing Wang

One factor that affects the success of machine learning is the presence of irrelevant or redundant information in the training data set. Filter-based feature ranking techniques (rankers) rank the features according to their relevance to the target attribute and we choose the most relevant features to build classification models subsequently. In order to evaluate the effectiveness of different feature ranking techniques, a commonly used method is to assess the classification performance of models built with the respective selected feature subsets in terms of a given performance metric (e.g., classification accuracy or misclassification rate). Since a given performance metric usually can …


A Comparative Study Of Threshold-Based Feature Selection Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Jason Van Hulse Aug 2010

A Comparative Study Of Threshold-Based Feature Selection Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Jason Van Hulse

Computer Science Faculty Publications

Abstract Given high-dimensional software measurement data, researchers and practitioners often use feature (metric) selection techniques to improve the performance of software quality classification models. This paper presents our newly proposed threshold-based feature selection techniques, comparing the performance of these techniques by building classification models using five commonly used classifiers. In order to evaluate the effectiveness of different feature selection techniques, the models are evaluated using eight different performance metrics separately since a given performance metric usually captures only one aspect of the classification performance. All experiments are conducted on three Eclipse data sets with different levels of class imbalance. The …


A Comparative Study Of Filter-Based Feature Ranking Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao Aug 2010

A Comparative Study Of Filter-Based Feature Ranking Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao

Computer Science Faculty Publications

One factor that affects the success of machine learning is the presence of irrelevant or redundant information in the training data set. Filter-based feature ranking techniques (rankers) rank the features according to their relevance to the target attribute and we choose the most relevant features to build classification models subsequently. In order to evaluate the effectiveness of different feature ranking techniques, a commonly used method is to assess the classification performance of models built with the respective selected feature subsets in terms of a given performance metric (e.g., classification accuracy or misclassification rate). Since a given performance metric usually can …


Choosing Management Information Systems As A Major: Understanding The Smifactors For Mis, Thomas W. Ferratt, Stephen R. Hall, Jayesh Prasad, Donald E. Wynn Aug 2010

Choosing Management Information Systems As A Major: Understanding The Smifactors For Mis, Thomas W. Ferratt, Stephen R. Hall, Jayesh Prasad, Donald E. Wynn

MIS/OM/DS Faculty Publications

Given declining management information systems (MIS) enrollments at our university, we seek to understand our students‘ selection of a major. Prior studies have found that students choose a major based on a number of factors, with subject matter interest consistently being most important. We contribute to the literature by developing a deeper understanding of what is meant by subject matter interest, which we refer to as smiFactors, for MIS as a major and career. Based on a qualitative analysis of open-ended survey questions completed by undergraduate business students, we confirm a number of smiFactors for MIS gleaned from recent studies …


Measurement And Interpolation Of Sea Surface Temperature And Salinity In The Tropical Pacific: A 9,000 Nautical Mile Research Odyssey, Amber Brooks Jun 2010

Measurement And Interpolation Of Sea Surface Temperature And Salinity In The Tropical Pacific: A 9,000 Nautical Mile Research Odyssey, Amber Brooks

Earth and Soil Sciences

The purpose of this project was to compare spline and inverse distance weighting interpolation tools on data collected in the tropical Pacific Ocean by ship and data from a global network of CTD floats, known as Argo floats (fig.1), to provide evidence that technological advancement and integration is aiding our understanding of the ocean-atmosphere system of planet Earth. Thirty-one sea surface temperature and salinity samples were manually taken across a 9,000 nautical mile trek of the Pacific Ocean for the months of April, May and June 2008. Argo ASCII globally gridded monthly averaged sea surface temperature and salinity data, from …


Re-Solving Stochastic Programming Models For Airline Revenue Management, Lijian Chen, Tito Homem-De-Mello Jun 2010

Re-Solving Stochastic Programming Models For Airline Revenue Management, Lijian Chen, Tito Homem-De-Mello

MIS/OM/DS Faculty Publications

We study some mathematical programming formulations for the origin-destination model in airline revenue management. In particular, we focus on the traditional probabilistic model proposed in the literature. The approach we study consists of solving a sequence of two-stage stochastic programs with simple recourse, which can be viewed as an approximation to a multi-stage stochastic programming formulation to the seat allocation problem. Our theoretical results show that the proposed approximation is robust, in the sense that solving more successive two-stage programs can never worsen the expected revenue obtained with the corresponding allocation policy. Although intuitive, such a property is known not …


Capacity-Driven Pricing Mechanism In Special Service Industries, Lijian Chen, Suraj M. Alexander May 2010

Capacity-Driven Pricing Mechanism In Special Service Industries, Lijian Chen, Suraj M. Alexander

MIS/OM/DS Faculty Publications

We propose a capacity driven pricing mechanism for several service industries in which the customer behavior, the price demand relationship, and the competition are significantly distinct from other industries. According our observation, we found that the price demand relationship in these industries cannot be modeled by fitted curves; the customers would neither plan in advance nor purchase the service strategically; and the competition would be largely local. We analyze both risk neutral and risk aversion pricing models and conclude the proposed capacity driven model would be the optimal solution under mild assumptions. The resulting pricing mechanism has been implemented at …


Personalization By Website Transformation: Theory And Practice, Saverio Perugini May 2010

Personalization By Website Transformation: Theory And Practice, Saverio Perugini

Computer Science Faculty Publications

We present an analysis of a progressive series of out-of-turn transformations on a hierarchical website to personalize a user’s interaction with the site. We formalize the transformation in graph-theoretic terms and describe a toolkit we built that enumerates all of the traversals enabled by every possible complete series of these transformations in any site and computes a variety of metrics while simulating each traversal therein to qualify the relationship between a site’s structure and the cumulative effect of support for the transformation in a site. We employed this toolkit in two websites. The results indicate that the transformation enables users …


Enterprise Users And Web Search Behavior, April Ann Lewis May 2010

Enterprise Users And Web Search Behavior, April Ann Lewis

Masters Theses

This thesis describes analysis of user web query behavior associated with Oak Ridge National Laboratory’s (ORNL) Enterprise Search System (Hereafter, ORNL Intranet). The ORNL Intranet provides users a means to search all kinds of data stores for relevant business and research information using a single query. The Global Intranet Trends for 2010 Report suggests the biggest current obstacle for corporate intranets is “findability and Siloed content”. Intranets differ from internets in the way they create, control, and share content which can make it often difficult and sometimes impossible for users to find information. Stenmark (2006) first noted studies of corporate …


Supporting Multiple Paths To Objects In Information Hierarchies: Faceted Classification, Faceted Search, And Symbolic Links, Saverio Perugini Jan 2010

Supporting Multiple Paths To Objects In Information Hierarchies: Faceted Classification, Faceted Search, And Symbolic Links, Saverio Perugini

Computer Science Faculty Publications

We present three fundamental, interrelated approaches to support multiple access paths to each terminal object in information hierarchies: faceted classification, faceted search, and web directories with embedded symbolic links. This survey aims to demonstrate how each approach supports users who seek information from multiple perspectives. We achieve this by exploring each approach, the relationships between these approaches, including tradeoffs, and how they can be used in concert, while focusing on a core set of hypermedia elements common to all. This approach provides a foundation from which to study, understand, and synthesize applications which employ these techniques. This survey does not …