Clustering And Validation Of Microarray Data Using Consensus Clustering, 2010 San Jose State University
Clustering And Validation Of Microarray Data Using Consensus Clustering, Sarbinder Kallar
Clustering is a popular method to glean useful information from microarray data. Unfortunately the results obtained from the common clustering algorithms are not consistent and even with multiple runs of different algorithms a further validation step is required. Due to absence of well defined class labels, and unknown number of clusters, the unsupervised learning problem of finding optimal clustering is hard. Obtaining a consensus of judiciously obtained clusterings not only provides stable results but also lends a high level of confidence in the quality of results. Several base algorithm runs are used to generate clusterings and a co-association matrix of ...
Open Source Analysis Of Biomedical Figures, 2010 San Jose State University
Open Source Analysis Of Biomedical Figures, David Shao
With a selection of biomedical literature available for open access, a natural pairing seems to be the use of open source software to automatically analyze content, in particular, the content of gures. Considering the large number of possible tools and approaches, we choose to focus on the recognition of printed characters. As the problem of optical character recognition (OCR) under rea- sonable conditions is considered to be solved, and as open source software is fully capable of isolating the location of characters and identifying most of them accurately, we instead use OCR as an application area for the relatively recent ...
Email Data Mining: An Approach To Construct An Organization Position-Wise Structure While Performing Email Analysis, 2010 San Jose State University
Email Data Mining: An Approach To Construct An Organization Position-Wise Structure While Performing Email Analysis, Bhargav Vadher
In this age of social networking, it is necessary to define the relationships among the members of a social network. Various techniques are already available to define user- to-user relationships across the network. Over time, many algorithms and machine learning techniques were applied to find relationships over social networks, yet very few techniques and information are available to define a relation directly over raw email data. Few educational societies have developed a way to mine the email log files and have found the inter-relation between the users by means of clusters. Again, there is no solid technique available that can ...
Mobile Search Engine Using Clustering And Query Expansion, 2010 San Jose State University
Mobile Search Engine Using Clustering And Query Expansion, Huy Nguyen
Internet content is growing exponentially and searching for useful content is a tedious task that we all deal with today. Mobile phones lack of screen space and limited interaction methods makes traditional search engine interface very inefficient. As the use of mobile internet continues to grow there is a need for an effective search tool. I have created a mobile search engine that uses clustering and query expansion to find relevant web pages efficiently. Clustering organizes web pages into groups that reflect different components of a query topic. Users can ignore clusters that they find irrelevant so they are not ...
How Smart Is Your Android Smartphone?, 2010 San Jose State University
How Smart Is Your Android Smartphone?, Deepika Mulani
Smart phones are ubiquitous today. These phones generally have access to sensitive personal information and, consequently, they are a prime target for attackers. A virus or worm that spreads over the network to cell phone users could be particularly damaging. Due to a rising demand for secure mobile phones, manufacturers have increased their emphasis on mobile security. In this project, we address some security issues relevant to the current Android smartphone framework. Specifically, we demonstrate an exploit that targets the Android telephony service. In addition, as a defense against the loss of personal information, we provide a means to encrypt ...
Improved Software Activation Using Multithreading, 2010 San Jose State University
Improved Software Activation Using Multithreading, Jian Rui Zhang
Software activation is an anti-piracy technology designed to verify that software products have been legitimately licensed . It is supposed to be quick and simple while simultaneously protecting customer privacy. The most common form of software activation is through the entering of legitimate product serial numbers by users, which sometimes are also known as product keys. This technique is employed by various software, from small shareware programs to large commercial programs such as Microsoft Office. However, software activation based on a serial number appears to be weak, as various cracks for a majority of programs are available and can be ...
Questioning Cultural Commons, 2010 Georgetown University Law Center
Questioning Cultural Commons, Lawrence B. Solum
Georgetown Law Faculty Publications and Other Works
In Constructing Commons in the Cultural Environment, Michael J. Madison, Brett M. Frischmann, and Katherine J. Strandburg offer an innovative and attractive vision of the future of cultural and scientific knowledge through the construction of “cultural commons,” which they define as “environments for developing and distributing cultural and scientific knowledge through institutions that support pooling and sharing that knowledge in a managed way.” The kind of “commons” they have in mind is modeled on the complex arrangement of social norms that allocate lobstering rights among fishermen in Maine and extends to arrangements such as patent pools, open-source software development (e ...
Delay, Cost And Infrastructure Tradeoff Of Epidemic Routing In Mobile Sensor Networks, 2010 Iowa State University
Delay, Cost And Infrastructure Tradeoff Of Epidemic Routing In Mobile Sensor Networks, Shan Zhou, Lei Ying, Srikanta Tirthapura
Electrical and Computer Engineering Conference Papers, Posters and Presentations
This paper studies the delay, cost and infrastructure tradeoff of epidemic routing in mobile sensor networks. We consider a mobile sensor network with M mobiles and B static base stations. The mobile sensors collect information when moving around and need to report the information to the base stations. Three different epidemic routing schemes --- target epidemic routing, uncontrolled epidemic routing and controlled epidemic routing --- are analyzed in this paper. For each of the three schemes, we characterize the scaling behaviors of the delay, which is defined to be the average number of time slots required to deliver a message, and the ...
Model Based Analysis Of Some High Speed Network Issues, 2010 Louisiana State University and Agricultural and Mechanical College
Model Based Analysis Of Some High Speed Network Issues, Suman Kumar
LSU Doctoral Dissertations
The study of complex problems in science and engineering today typically involves large scale data, huge number of large-scale scientific breakthroughs critically depends on large multi-disciplinary and geographically-dispersed research teams, where the high speed network becomes the integral part. To serve the ongoing bandwidth requirement and scalability of these networks, there has been a continuous evolution of different TCPs for high speed networks. Testing these protocols on a real network would be expensive, time consuming and more over not easily available to the researchers worldwide. Network simulation is well accepted and widely used method for performance evaluation, it is well ...
The Impact Of Overfitting And Overgeneralization On The Classification Accuracy In Data Mining, 2010 Louisiana State University and Agricultural and Mechanical College
The Impact Of Overfitting And Overgeneralization On The Classification Accuracy In Data Mining, Huy Nguyen Anh Pham
LSU Doctoral Dissertations
Current classification approaches usually do not try to achieve a balance between fitting and generalization when they infer models from training data. Such approaches ignore the possibility of different penalty costs for the false-positive, false-negative, and unclassifiable types. Thus, their performances may not be optimal or may even be coincidental. This dissertation analyzes the above issues in depth. It also proposes two new approaches called the Homogeneity-Based Algorithm (HBA) and the Convexity-Based Algorithm (CBA) to address these issues. These new approaches aim at optimally balancing the data fitting and generalization behaviors of models when some traditional classification approaches are used ...
Choosing Between Remote I/O Versus Staging In Distributed Environments, 2010 Louisiana State University and Agricultural and Mechanical College
Choosing Between Remote I/O Versus Staging In Distributed Environments, Ibrahim Hakki Suslu
LSU Doctoral Dissertations
Today, scientifi_x000C_c applications and experiments have become increasingly complex and more demanding in terms of their computational and data requirements. The amount of data generated and used has grown at a very rapid rate. As tens or hundreds of terabytes of data for a single application is very common today; petabytes and even exabytes of data will be very common in a few years. One of the major challenges in distributed computing environments is how to access these large datasets remotely over the network. Data staging and remote I/O are the most widely used data access methods for distributed ...
Application-Level Optimization Of End-To-End Data Transfer Throughput, 2010 Louisiana State University and Agricultural and Mechanical College
Application-Level Optimization Of End-To-End Data Transfer Throughput, Esma Yildirim
LSU Doctoral Dissertations
For large-scale distributed applications, effective use of available network throughput and optimization of data transfer speed is crucial for end-to-end application performance. Today, many regional and national optical networking initiatives such as LONI, ESnet and Teragrid provide high speed network connectivity to their users. However, majority of the users fail to obtain even a fraction of the theoretical speeds promised by these networks due to issues such as sub-optimal protocol tuning, disk bottleneck on the sending and/or receiving ends, and processor limitations. This implies that having high speed networks in place is important but not sufficient for the improvement ...
Towards Context-Aware Real-Time Information Dissemination, 2010 Binghamton University--SUNY
Towards Context-Aware Real-Time Information Dissemination, Kyoung-Don Kang, Greg Vert
Computer Science Faculty Scholarship
Real-time information dissemination is essential for the success of key applications such as transportation management and battlefield monitoring. In these applications, relevant information should be disseminated to interested users in a timely fashion. However, it is challenging to support timely information dissemination due to the limited and even time-varying network bandwidth. Thus, a naive approach disseminating every data with no consideration of the context that describes where and when the data is acquired and how it can satisfy users may only provide poor performance and user perceived quality of service (QoS). To address the problem, we design a novel context-aware ...
Analysis Avoidance Techniques Of Malicious Software, 2010 Edith Cowan University
Analysis Avoidance Techniques Of Malicious Software, Murray Brand
Theses: Doctorates and Masters
Anti Virus (AV) software generally employs signature matching and heuristics to detect the presence of malicious software (malware). The generation of signatures and determination of heuristics is dependent upon an AV analyst having successfully determined the nature of the malware, not only for recognition purposes, but also for the determination of infected files and startup mechanisms that need to be removed as part of the disinfection process. If a specimen of malware has not been previously extensively analyzed, it is unlikely to be detected by AV software. In addition, malware is becoming increasingly profit driven and more likely to incorporate ...
Automatic Readability Assessment, 2010 The Graduate Center, City University of New York
Automatic Readability Assessment, Lijun Feng
All Dissertations, Theses, and Capstone Projects
We describe the development of an automatic tool to assess the readability of text documents. Our readability assessment tool predicts elementary school grade levels of texts with high accuracy. The tool is developed using supervised machine learning techniques on text corpora annotated with grade levels and other indicators of reading difficulty. Various independent variables or features are extracted from texts and used for automatic classification. We systematically explore different feature inventories and evaluate the grade-level prediction of the resulting classifiers. Our evaluation comprises well-known features at various linguistic levels from the existing literature, such as those based on language modeling ...
Utilizing The Technology Acceptance Model To Assess The Employee Adoption Of Information Systems Security Measures, 2010 Nova Southeastern University
Utilizing The Technology Acceptance Model To Assess The Employee Adoption Of Information Systems Security Measures, Cynthia M. Jones, Richard V. Mccarthy, Leila Halawi, Bahaudin Mujtaba
In this study, the factors that affect employee acceptance of information systems security measures were examined by extending the Technology Acceptance Model. Partial least squares structural equation modeling was applied to examine these factors. 174 valid responses from employees from companies in various industry segments in the United States and Canada were analyzed. The results of the statistical analysis indicate that subjective norm moderated by management support showed the strongest effect on intention to use information systems security measures.
Book Review: Digital Forensic Evidence Examination, 2010 Gary Kessler Associates
Book Review: Digital Forensic Evidence Examination, Gary C. Kessler
This document is Dr. Kessler's review of the second edition of Digital Forensic Evidence Examination by Fred Cohen. ASP Press, 2010. ISBN: 978-1-878109-45-3
Forensic Analysis Of A Playstation 3 Console, 2010 University of Central Florida
Forensic Analysis Of A Playstation 3 Console, Scott Conrad, Greg Dorn, Philip Craiger
The Sony PlayStation 3 (PS3) is a powerful gaming console that supports Internet-related activities, local file storage and the playing of Blu-ray movies. The PS3 also allows users to partition and install a secondary operating system on the hard drive. This “desktop-like” functionality along with the encryption of the primary hard drive containing the gaming software raises significant issues related to the forensic analysis of PS3 systems. This paper discusses the PS3 architecture and behavior, and provides recommendations for conducting forensic investigations of PS3 systems.
Implementation And Analysis Of A Top-K Retrieval System For Strings, 2010 Louisiana State University and Agricultural and Mechanical College
Implementation And Analysis Of A Top-K Retrieval System For Strings, Sabrina Chandrasekaran
LSU Master's Theses
Given text which is a union of d documents of strings, D = d1, d2,...., dd, the emphasis of this thesis is to provide a practical framework to retrieve the K most relevant documents for a given pattern P, which comes as a query. This cannot be done directly, as going through every occurrence of the query pattern may prove to be expensive if the number of documents that the pattern occurs in is much more than the number of documents (K) that we require. Some advanced query functionality will be required, as compared to listing the documents that the pattern ...
An Adaptable Group Communication System, 2010 Louisiana State University and Agricultural and Mechanical College
An Adaptable Group Communication System, Vikram Reddy Kayathi
LSU Master's Theses
Existing group communication systems like ISIS, Spread, Jgroups etc., provide group communication in a synchronous environment. They are built on top of TCP/IP or UDP and guarantee virtual synchrony and consistency. However, wide area distributed systems are inherently asynchronous. Existing group communication systems are not suitable for wide area deployment. They do not provide persistent communication; i.e., if a node gets temporarily disconnected, all messages directed to that node during that period are lost. Hence such systems are not suitable for deployment in disadvantaged networks. While, according to Brewer’s CAP theorem, it is impossible for a distributed ...