Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 11 of 11

Full-Text Articles in Physical Sciences and Mathematics

Consolidated Study On Query Expansion, Abhishek Biruduraju Dec 2011

Consolidated Study On Query Expansion, Abhishek Biruduraju

UNLV Theses, Dissertations, Professional Papers, and Capstones

A typical day of million web users all over the world starts with a simple query. The quest for information on a particular topic drives them to search for it, and in the pursuit of their info the terms they supply for queries varies from person to person depending on the knowledge they have. With a vast collection of documents available on the web universe it is the onus of the retrieval system to return only those documents that are relevant and satisfy the user’s search requirements. The document mismatch problem is resolved by appending extra query terms to the …


Study Of Feature Selection Algorithms For Text-Categorization, Kandarp Dave Dec 2011

Study Of Feature Selection Algorithms For Text-Categorization, Kandarp Dave

UNLV Theses, Dissertations, Professional Papers, and Capstones

This thesis will discuss feature selection algorithms for text-categorization. Feature selection algorithms are very important, as they can make-or-break a categorization engine. The feature selection algorithms that will be discussed in this thesis are Document Frequency, Information Gain, Chi Squared, Mutual Information, NGL (Ng-Goh-Low) coefficient, and GSS (Galavotti-Sebastiani-Simi) coefficient . The general idea of any feature selection algorithm is to determine importance of words using some measure that can keep informative words, and remove non-informative words, which can then help the text-categorization engine categorize a document, D , into some category, C . These feature selection methods are explained, implemented, …


Processj: A Process-Oriented Programming Language, Matthew Sowders Dec 2011

Processj: A Process-Oriented Programming Language, Matthew Sowders

UNLV Theses, Dissertations, Professional Papers, and Capstones

Java is a general purpose object-oriented programming language that has been widely adopted. Because of its high adoption rate and its lineage as a C-style language, its syntax is familiar to many programmers. The downside is that Java is not natively concurrent. Volumes have been written about concurrent programming in Java; however, concurrent programming is difficult to reason about within an object-oriented paradigm and so is difficult to get right.

occam -π is a general purpose process-oriented programming language. Concurrency is part of the theoretical underpinnings of the language. Concurrency is simple to reason about within an occam -π application …


Parallel Machines Scheduling With Applications To Internet Ad-Slot Placement, Shaista Lubna Dec 2011

Parallel Machines Scheduling With Applications To Internet Ad-Slot Placement, Shaista Lubna

UNLV Theses, Dissertations, Professional Papers, and Capstones

We consider a class of problems of scheduling independent jobs on identical, uniform and unrelated parallel machines with an objective of achieving an optimal schedule. The primary focus is on the minimization of the maximum completion time of the jobs, commonly referred to as Makespan (C max ). We survey and present examples of uniform machines and its applications to the single slot and multiple slots based on bids and budgets.

The Internet is an important advertising medium attracting large number of advertisers and users. When a user searches for a query, a search engine returns a set of results …


Dynamic Indexing, Viswada Sripathi Dec 2010

Dynamic Indexing, Viswada Sripathi

UNLV Theses, Dissertations, Professional Papers, and Capstones

In this thesis, we report on index constructions for large document collections to facilitate the task of search and retrieval. We first report on classical static index construction methods and their shortcomings. We then report on dynamic index construction techniques and their effectiveness.


Cloud Storage And Online Bin Packing, Swathi Venigella Aug 2010

Cloud Storage And Online Bin Packing, Swathi Venigella

UNLV Theses, Dissertations, Professional Papers, and Capstones

Cloud storage is the service provided by some corporations (such as Mozy and Carbonite) to store and backup computer files. We study the problem of allocating memory of servers in a data center based on online requests for storage. Over-the-net data backup has become increasingly easy and cheap due to cloud storage. Given an online sequence of storage requests and a cost associated with serving the request by allocating space on a certain server one seeks to select the minimum number of servers as to minimize total cost. We use two different algorithms and propose a third algorithm; we show …


A Comparative Study On Text Categorization, Aditya Chainulu Karamcheti May 2010

A Comparative Study On Text Categorization, Aditya Chainulu Karamcheti

UNLV Theses, Dissertations, Professional Papers, and Capstones

Automated text categorization is a supervised learning task, defined as assigning category labels to new documents based on likelihood suggested by a training set of labeled documents. Two examples of methodology for text categorizations are Naive Bayes and K-Nearest Neighbor.

In this thesis, we implement two categorization engines based on Naive Bayes and K-Nearest Neighbor methodology. We then compare the effectiveness of these two engines by calculating standard precision and recall for a collection of documents. We will further report on time efficiency of these two engines.


A Study Of Relevance Feedback In Vector Space Model, Deepthi Katta May 2009

A Study Of Relevance Feedback In Vector Space Model, Deepthi Katta

UNLV Theses, Dissertations, Professional Papers, and Capstones

Information Retrieval is the science of searching for information or documents based on information need from a huge set of documents. It has been an active field of research since early 19th century and different models of retrieval came in to existence to cater the information need.

This thesis starts with understanding some of the basic information retrieval models, followed by implementation of one of the most popular statistical retrieval model known as Vector Space Model. This model ranks the documents in the collection based on the similarity measure calculated between the query and the respective document. The user …


Efficient Clustering Techniques For Managing Large Datasets, Vasanth Nemala Jan 2009

Efficient Clustering Techniques For Managing Large Datasets, Vasanth Nemala

UNLV Theses, Dissertations, Professional Papers, and Capstones

The result set produced by a search engine in response to the user query is very large. It is typically the responsibility of the user to browse the result set to identify relevant documents. Many tools have been developed to assist the user to identify the most relevant documents. One such a tool is clustering technique. In this method, the closely related documents are grouped based on their contents. Hence if a document turns out to be relevant, so are the rest of the documents in the cluster. So it would be easy for a user to sift through the …


Effects Of Similarity Metrics On Document Clustering, Rushikesh Veni Jan 2009

Effects Of Similarity Metrics On Document Clustering, Rushikesh Veni

UNLV Theses, Dissertations, Professional Papers, and Capstones

Document clustering or unsupervised document classification is an automated process of grouping documents with similar content. A typical technique uses a similarity function to compare documents. In the literature, many similarity functions such as dot product or cosine measures are proposed for the comparison operator.

For the thesis, we evaluate the effects a similarity function may have on clustering. We start by representing a document and a query, both as a vector of high-dimensional space corresponding to the keywords followed by using an appropriate distance measure in k-means to compute similarity between the document vector and the query vector to …


Development Of A Systems Engineering Model For Chemical Separation Process, Lijian Sun Dec 2003

Development Of A Systems Engineering Model For Chemical Separation Process, Lijian Sun

UNLV Theses, Dissertations, Professional Papers, and Capstones

This thesis is concerned with the efforts to develop a general-purpose systems engineering model software TRPSEMPro1 that can be used to improve productivity in the design process. Different features of TRPSEMPro will be presented in this thesis. First, Systems Engineering technology is presented, followed by the exposition of different numerical optimization technologies and DOE (Design of Experiments) study technologies. Second, the detailed software process, Object-Oriented Analysis and Design (OOA&D) for the TRPSEMPro is presented. All the design data models are expressed by using Unified Modeling Language (UML).

AMUSESimulator is another software package which has been designed and implemented in order …