Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 21 of 21

Full-Text Articles in Physical Sciences and Mathematics

Bi-Level Clustering Of Mixed Categorical And Numerical Biomedical Data, Bill Andreopoulos, Aijun An, Xiaogang Wang Jun 2006

Bi-Level Clustering Of Mixed Categorical And Numerical Biomedical Data, Bill Andreopoulos, Aijun An, Xiaogang Wang

Faculty Publications, Computer Science

Biomedical data sets often have mixed categorical and numerical types, where the former represent semantic information on the objects and the latter represent experimental results. We present the BILCOM algorithm for |Bi-Level Clustering of Mixed categorical and numerical data types|. BILCOM performs a pseudo-Bayesian process, where the prior is categorical clustering. BILCOM partitions biomedical data sets of mixed types, such as hepatitis, thyroid disease and yeast gene expression data with Gene Ontology annotations, more accurately than if using one type alone.


Introducing Semantics In Web Personalization: The Role Of Ontologies, Magdalini Eirinaki, Dimitrios Mavroeidis, George Tsatsaronis, Michalis Vazirgiannis Jan 2006

Introducing Semantics In Web Personalization: The Role Of Ontologies, Magdalini Eirinaki, Dimitrios Mavroeidis, George Tsatsaronis, Michalis Vazirgiannis

Faculty Publications

Web personalization is the process of customizing a web site to the needs of each specific user or set of users. Personalization of a web site may be performed by the provision of recommendations to the users, high-lighting/adding links, creation of index pages, etc. The web personalization systems are mainly based on the exploitation of the navigational patterns of the web site’s visitors. When a personalization system relies solely on usage-based results, however, valuable information conceptually related to what is finally recommended may be missed. The exploitation of the web pages’ semantics can considerably improve the results of web usage …


Concept Based Document Clustering Using A Simplicial Complex, A Hypergraph, Kevin Lind Jan 2006

Concept Based Document Clustering Using A Simplicial Complex, A Hypergraph, Kevin Lind

Master's Projects

This thesis evaluates the effectiveness of using a combinatorial topology structure (a simplicial complex) for document clustering. It is believed that a simplicial complex better identifies the latent concept space defined by a collection of documents than the use of hypergraphs or human categorization. The complex is constructed using groups of co-occurring words (term associations) identified using traditional data mining methods. Disjoint subsections of the complex (connect components) represent general concepts within the documents’ concept space. Documents clustered to these connect components will produce meaningful groupings. Instead, the most specific concepts (maximal simplices) are used as representative connect components to …


Bluetooth Security Protocol Analysis And Improvements, Chi Shing Lee Jan 2006

Bluetooth Security Protocol Analysis And Improvements, Chi Shing Lee

Master's Projects

Since its creation, Bluetooth has transformed itself from a cable replacement technology to a wireless technology that connects people and machines. Bluetooth has been widely adapted on mobile phones and PDAs. Many other vendors in other industries are integrating Bluetooth into their products. Although vendors are adapting to the technology, Bluetooth hasn’t been a big hit among users. Security remains a major concern. Poor implementation of the Bluetooth architecture on mobile devices leads to some high profiled Bluetooth hacks. Weak security protocol designs expose the Bluetooth system to some devastating protocol attacks. This paper first explores four Bluetooth protocol-level attacks …


Engineering Enterprise Software Systems With Interactive Uml Models And Aspect-Oriented Middleware, Paul Nguyen Jan 2006

Engineering Enterprise Software Systems With Interactive Uml Models And Aspect-Oriented Middleware, Paul Nguyen

Master's Projects

Large scale enterprise software systems are inherently complex and hard to maintain. To deal with this complexity, current mainstream software engineering practices aim at raising the level of abstraction to visual models described in OMG’s UML modeling language. Current UML tools, however, produce static design diagrams for documentation which quickly become out-of-sync with the software, and thus obsolete. To address this issue, current model-driven software development approaches aim at software automation using generators that translate models into code. However, these solutions don’t have a good answer for dealing with legacy source code and the evolution of existing enterprise software systems. …


Visualization Of Secondary Rna Structure Prediction Algorithms, Brandon Hunter Jan 2006

Visualization Of Secondary Rna Structure Prediction Algorithms, Brandon Hunter

Master's Projects

This chapter introduces the secondary structure prediction problem. It describes what the secondary structure prediction problem is and why it is important. Based on the importance of the algorithm it is essential to have a clear means to visually represent the problem. Therefore, this chapter details the high level goals of the visualization. It details how the visualization will visually represent the problem through several simultaneous representations. These visual representations will be tied together in order to increase the understanding of the algorithm.


A Meaningful Md5 Hash Collision Attack, Narayana D. Kashyap Jan 2006

A Meaningful Md5 Hash Collision Attack, Narayana D. Kashyap

Master's Projects

It is now proved by Wang et al., that MD5 hash is no more secure, after they proposed an attack that would generate two different messages that gives the same MD5 sum. Many conditions need to be satisfied to attain this collision. Vlastimil Klima then proposed a more efficient and faster technique to implement this attack. We use these techniques to first create a collision attack and then use these collisions to implement meaningful collisions by creating two different packages that give identical MD5 hash, but when extracted, each gives out different files with contents specified by the atacker.


Compact Representation Of Association Rule, Mien K. Siao Jan 2006

Compact Representation Of Association Rule, Mien K. Siao

Master's Projects

Bitmap is an extremely efficient way of representing data, but the drawback is that the order of data is fixed in a bitmap. Granular computing is a new theory that frees the bitmap method from fixed order of data in the same manner as linear algebra frees the matrix theory from a fixed basis. To obtain meaningful information using data mining techniques has been a central idea in recent database applications [2]. One of the core techniques in data mining is to find associations (undirected association rules) between attribute values [4]. The complexity of finding associations is often very high. …


Finding Optimal Reduct For Rough Sets By Using A Decision Tree Learning Algorithm, Xin Li Jan 2006

Finding Optimal Reduct For Rough Sets By Using A Decision Tree Learning Algorithm, Xin Li

Master's Projects

Rough Set theory is a mathematical theory for classification based on structural analysis of relational data. It can be used to find the minimal reduct. Minimal reduct is the minimal knowledge representation for the relational data. The theory has been successfully applied to various domains in data mining. However, a major limitation in Rough Set theory is that finding the minimal reduct is an NP-hard problem. C4.5 is a very popular decision tree-learning algorithm. It is very efficient at generating a decision tree. This project uses the decision tree generated by C4.5 to find the optimal reduct for a relational …


Geometry-Based Detection Of Flash Worms, Sang Soo Kim Jan 2006

Geometry-Based Detection Of Flash Worms, Sang Soo Kim

Master's Projects

While it takes traditional internet worms hours to infect all the vulnerable hosts on the Internet, a flash worm takes seconds. Because of the rapid rate with which flash worms spread, the existing worm defense mechanisms cannot respond fast enough to detect and stop the flash worm infections. In this project, we propose a geometric-based detection mechanism that can detect the spread of flash worms in a short period of time. We tested the mechanism on various simulated flash worm traffics consisting of more than 10,000 nodes. In addition to testing on flash worm traffics, we also tested the mechanism …


Implementing Built-In Properties For The Java Programming Language, Alexandre Alves Jan 2006

Implementing Built-In Properties For The Java Programming Language, Alexandre Alves

Master's Projects

The purpose of this project is to improve the programming experience of using the Java language by implementing properties as a built-in facility. In this project, the Java compiler tool, and the Java documentation tool were modified. In addition, a new Java annotation processor that generates Java BeanInfo source files was created. These new features result in a more productive development environment for the Java programming language.


Analysis And Detection Of Metamorphic Computer Viruses, Wing Wong Jan 2006

Analysis And Detection Of Metamorphic Computer Viruses, Wing Wong

Master's Projects

Virus writers and anti-virus researches generally agree that metamorphism is the way to generate undetectable viruses. Several virus writers have released virus creation kits and claimed that they possess the ability to automatically produce morphed virus variants that look substantially different from one another. To see how effective these code morphing engines are, and how much difference exists between variants of a same virus, we measured the similarity between virus variants generated by four virus generators downloaded from the Internet. Our result shows that the effectiveness of these generators varies widely. While the best generator, NGVCK, is able to create …


Automatic Extraction Of Keywords And Co-Occurrence Keyword Sets, Mong-Hang Vo Jan 2006

Automatic Extraction Of Keywords And Co-Occurrence Keyword Sets, Mong-Hang Vo

Master's Projects

Internet search has become an essential part of almost everyone’s daily life and work. To make wise personal and business decisions in a timely fashion, one must access the most relevant information efficiently. Because the amount of information on the Internet is enormous, it is important that a search engine ranks the information appropriately when it presents search results to users. Latent Semantic Indexing (LSI) addresses relevance ranking based on how significant a search word is in each document. Some innovative approaches of computing higher dimensional LSI (HD-LSI) were explored in this project. In traditional LSI, the term frequency-inverse document …


Scalable Energy-Efficient Routing In Mobile Ad Hoc Network, Rashmi Kukanur Jan 2006

Scalable Energy-Efficient Routing In Mobile Ad Hoc Network, Rashmi Kukanur

Master's Projects

The quick deployment without any existing infrastructure makes mobile ad hoc networks (MANET) a striking choice for dynamic situations such as military and rescue operations, disaster recovery, and so on and so forth. However, routing remains one of the major issues in MANET due to the highly dynamic and distributed environment. Energy consumption is also a significant issue in ad hoc networks since the nodes are battery powered. This report discusses some major dominating set based approaches to perform energy efficient routing in mobile ad hoc networks. It also presents the performance results for each of these mentioned approaches in …


Enhancing Tcp Performance In Wired-Cum-Wireless Networks, Shruthi B. Krishnan Jan 2006

Enhancing Tcp Performance In Wired-Cum-Wireless Networks, Shruthi B. Krishnan

Master's Projects

Increasing popularity for mobile devices has prompted industrial and academic research towards improving the performance of wireless applications. Transmission Control Protocol (TCP) plays an important role in defining a network’s performance, and its use in wireless networks has exposed several inadequacies in its operation. Tight coupling of TCP’s error and congestion control mechanisms has proven to be incompatible with the unique characteristics of wireless channels. TCP, designed for wired networks, assumes any loss of packet to be an indication of congestion in the network. Wireless networks exhibit a higher bit error rate, low and varying bandwidth, and disconnections of hosts …


Juice: An Svg Rendering Peer For Java Swing, Ignatius Yuwono Jan 2006

Juice: An Svg Rendering Peer For Java Swing, Ignatius Yuwono

Master's Projects

SVG—a W3C XML standard—is a relatively new language for describing low-level vector drawings. Due to its cross-platform capabilities and support for events, SVG may potentially be used in interactive GUIs/graphical front-ends. However, a complete and full-featured widget set for SVG does not exist at the time of this writing. I have researched and implemented a framework which retargets a complete and mature raster- based widget library—the JFC Swing GUI library—into a vector-based display substrate: SVG. My framework provides SVG with a full-featured widget set, as well as augmenting Swing’s platform coverage. Furthermore, by using bytecode instrumentation techniques, my Swing to …


Authoring Xml Documents With Xhtml And Mathml Support, Xiaoheng Wu Jan 2006

Authoring Xml Documents With Xhtml And Mathml Support, Xiaoheng Wu

Master's Projects

Since the late 1970s, a large number of scientific documents have been authored in TeX or its derivations such as LaTeX. These typesetting systems allow anybody to write highquality books and articles. But the TeX syntax is not compatible with HTML or XML. So the WWW consortium's answer is MathML. The primary goal of MathML is to enable mathematical documents to be communicated, exchanged, and processed on the Web. Therefore, MathML documents are usually embedded with XHTML documents. Currently, there are several XHTML+MathML editors. The most popular editors use two common approaches. The first approach offers a WhatYouSeeIsWhatYouGet (WYSIWYG) interface. …


A Fast Algorithm For Data Mining, Aarathi Raghu Jan 2006

A Fast Algorithm For Data Mining, Aarathi Raghu

Master's Projects

In the past few years, there has been a keen interest in mining frequent itemsets in large data repositories. Frequent itemsets correspond to the set of items that occur frequently in transactions in a database. Several novel algorithms have been developed recently to mine closed frequent itemsets - these itemsets are a subset of the frequent itemsets. These algorithms are of practical value: they can be applied to real-world applications to extract patterns of interest in data repositories. However, prior to using an algorithm in practice, it is necessary to know its performance as well implementation issues. In this project, …


Study Of Rna Secondary Structure Prediction Algorithms, Lisa Yu Jan 2006

Study Of Rna Secondary Structure Prediction Algorithms, Lisa Yu

Master's Projects

Dynamic programming algorithms such as Nussinov algorithm and Zuker algorithm define criteria to search the most stable RNA secondary structures. Stochastic Context-Free Grammar (SCFG) predicts the most possible RNA secondary structure using context-free grammar and a defined set of probabilities for each grammar rule. These algorithms form the base of using computer programs to predict RNA secondary structures without pseudoknots. In this report, we review these RNA secondary structure prediction algorithms and present our own software implementations of these algorithms. The Nussinov algorithm is easy to understand. But our results show that the Nussinov algorithm is overly simplified and can …


Clustering High Dimensional Data Using Svm, Tam P. Ngo Jan 2006

Clustering High Dimensional Data Using Svm, Tam P. Ngo

Master's Projects

The Web contains massive amount of documents from across the globe to the point where it has become impossible to classify them manually. This project’s goal is to find a new method for clustering documents that are as close to humans’ classification as possible and at the same time to reduce the size of the documents. This project uses a combination of Latent Semantic Indexing (LSI) with Singular Value Decomposition (SVD) calculation as well as Support Vector Machine (SVM) classification. With SVD, data sets are decomposed and can be truncated to reduce the data sets size. The reduced data set …


Cooperative Interval Caching In Clustered Multimedia Servers, Kim Tran Jan 2006

Cooperative Interval Caching In Clustered Multimedia Servers, Kim Tran

Master's Projects

In this project, we design a cooperative interval caching (CIC) algorithm for clustered video servers, and evaluate its performance through simulation. The CIC algorithm describes how distributed caches in the cluster cooperate to serve a given request. With CIC, a clustered server can accommodate twice (95%) more number of cached streams than the clustered server without cache cooperation. There are two major processes of CIC to find available cache space for a given request in the cluster: to find the server containing the information about the preceding request of the given request; and to find another server which may have …