Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 26 of 26

Full-Text Articles in Entire DC Network

Query Expansion Using Wikipedia Concept Graph, Hadi Amiri, Abolfazl Ale Ahmad, Masoud Rahgozar, Farhad Oroumchian Aug 2010

Query Expansion Using Wikipedia Concept Graph, Hadi Amiri, Abolfazl Ale Ahmad, Masoud Rahgozar, Farhad Oroumchian

Farhad Oroumchian

A Concept Graph is a graph in which nodes are concepts and edges indicate the relationship between the concepts. In these graphs, a concept is usually represented by a single term or a phrase. Statistical methods can be used for concept graph construction. These methods are language independent and computationally efficient. One of the applications of concept graphs is finding other related concepts to the user query in a context dependent manner. This set of concepts can be used for automatic or manual query expansion. In this paper we study and evaluate a statistical method for concept graph construction and …


Assessment Of Query Reweighing, By Rocchio Method In Farsi Information Retrieval, F Saboori, H Bashiri, Farhad Oroumchian Jul 2010

Assessment Of Query Reweighing, By Rocchio Method In Farsi Information Retrieval, F Saboori, H Bashiri, Farhad Oroumchian

Farhad Oroumchian

Due to the lack of users knowledge of the collections used by search engines and in general retrieval systems, users can not express their information need appropriately in queries. In other words, they do not have enough experience to formulate their needs to find related documents. The idea of user’s query expansion aims to help users to improve and correct the queries. In fact, retrieval system, regarding the feedback it receives from user at the first stage, moves the query in set space to more related documents. Different approaches in information retrieval systems have been used; however, there has not …


Using Heuristic Rules To Improve Persian Part Of Speech Tagging Accuracy, Mitra Mohtarami, Hadi Amiri, Farhad Oroumchian, Masoud Rahgozar Jul 2010

Using Heuristic Rules To Improve Persian Part Of Speech Tagging Accuracy, Mitra Mohtarami, Hadi Amiri, Farhad Oroumchian, Masoud Rahgozar

Farhad Oroumchian

One of the major activities in Natural Language Processing is determining a word’s part of speech (POS) tag. In this research we focus on improving the accuracy of Persian part of speech tagging by applying post processing heuristic rules. To evaluate the effects of those rules we use Bijankhan tagged corpus and for tagging, Maximum Likelihood Estimation (MLE) approach is selected because of its simplicity and the ease of implementation. Furthermore, we have studied the effect of size of training on the accuracy of the MLE method. The experimental results show that the heuristic rules improve the accuracy especially for …


Evaluation Of Part Of Speech Tagging On Persian Text, F. Raja, H. Amiri, S. Tasharofi, M. Sarmadi, H. Hojjat, Farhad Oroumchian Jul 2010

Evaluation Of Part Of Speech Tagging On Persian Text, F. Raja, H. Amiri, S. Tasharofi, M. Sarmadi, H. Hojjat, Farhad Oroumchian

Farhad Oroumchian

One of the fundamental tasks in natural language processing is part of speech (POS) tagging. A POS tagger is a piece of software that reads text in some language and assigns a part of speech tag to each one of the words. Our main interest in this research was to see how easy it is to apply methods used in a language such as English to a new and different language such as Persian (Farsi) and what would be the performance of such approaches. This paper presents evaluation of several part of speech tagging methods on Persian text. These are …


An Enhanced Similarity Measure For Utilizing Site Structure In Web Personalization Systems, Shaghayegh Sahebi, Farhad Oroumchian, Ramtin Khosravi Jul 2010

An Enhanced Similarity Measure For Utilizing Site Structure In Web Personalization Systems, Shaghayegh Sahebi, Farhad Oroumchian, Ramtin Khosravi

Farhad Oroumchian

The need for recommendation systems to ease user navigations has become evident by growth of information on the Web. There exist many approaches of learning for Web usage-based recommendation systems. In hybrid recommendation systems, other knowledge resources, like content, semantics, and hyperlink structure of the Web site, have been utilized to enhance usage-based personalization systems. In this study, we introduce a new structure-based similarity measure for user sessions. We also apply two clustering algorithms on this similarity measure to compare it to cosine and another structure-based similarity measures. Our experiments exhibit that adding structure information, leveraging the proposed similarity measure, …


A Vector Based Method Of Ontology Matching, Z. Eidoon, N. Yazdani, Farhad Oroumchian Jul 2010

A Vector Based Method Of Ontology Matching, Z. Eidoon, N. Yazdani, Farhad Oroumchian

Farhad Oroumchian

Semantic interoperability is highly influenced by similarities and differences which exist between ontologies. Ontology matching as a solution for finding corresponding concepts among ontologies has emerged to facilitate semantic based negotiations of applications. This paper presents a method of ontology matching which is based on vectorizing ontologies and estimating their similarity degree. A post processing with two heuristic rules also has been employed to improve the results. The proposed method is successfully applied to the test suit of Ontology Alignment Evaluation Initiative 2005 [10] and compared to results obtained by other methods. In general the preliminary results are encouraging and …


Web-Graph Pre-Compression For Similarity Based Algorithms, Hamid Khalili, Amir Yahyavi, Farhad Oroumchian Jul 2010

Web-Graph Pre-Compression For Similarity Based Algorithms, Hamid Khalili, Amir Yahyavi, Farhad Oroumchian

Farhad Oroumchian

The size of web-graph created from crawling the web is an issue for every search engine. The amount of data gathered by web crawlers makes it impossible to load the web-graph into memory which increases the number of I/O operations. Compression can help reduce run-time of web-graph processing algorithms. We introduce a new algorithm for compressing the link structure of the web graph by grouping similar pages together and building a smaller representation of the graph. The resulted graph has far less edges than the original and the similarity between adjacency lists of nodes is increased dramatically which makes it …


Fusion Of Retrieval Models At Clef 2008 Ad Hoc Persian Track, Zahra Aghazade, Nazanin Dehghani, Leili Farzinvash, Razieh Rahimi, Abolfazel Aleahmad, Hadi Amiri, Farhad Oroumchian Jul 2010

Fusion Of Retrieval Models At Clef 2008 Ad Hoc Persian Track, Zahra Aghazade, Nazanin Dehghani, Leili Farzinvash, Razieh Rahimi, Abolfazel Aleahmad, Hadi Amiri, Farhad Oroumchian

Farhad Oroumchian

Metasearch engines submit the user query to several under- lying search engines and then merge their retrieved results to generate a single list that is more e®ective to the users information needs. According to the idea behind metasearch engines, it seems that merging the results retrieved from di®erent retrieval models will improve the search coverage and precision. In this study, we have investigated the e®ect of fusion of di®erent retrieval techniques on the performance of Persian retrieval. We use an extension of Ordered Weighted Average (OWA) operator called IOWA and a weighting schema, NOWA for merging the results. Our ex- …


Finding Similarity Relations In Presence Of Taxonomic Relations In Ontology Learning Systems, A. R. Vazifedoost, Farhad Oroumchian, M. Rahgozar Jul 2010

Finding Similarity Relations In Presence Of Taxonomic Relations In Ontology Learning Systems, A. R. Vazifedoost, Farhad Oroumchian, M. Rahgozar

Farhad Oroumchian

Ontology learning tries to find ontological relations, by an automatic process. Similarity relationships are one of non-taxonomic relations which may be included in ontology. Our idea is that in presence of taxonomic relations we are able to extract more useful non-taxonomic similarity relations. In this paper we investigate the specifications of an implemented system for extracting these relations by means of new context extraction method which uses taxonomic relations.


Keyword Suggestion Using Conceptual Graph Construction From Wikipedia Rich Documents, Hadi Amiri, Abolfazl Aleahmad, Masoud Rahgozar, Farhad Oroumchian Jul 2010

Keyword Suggestion Using Conceptual Graph Construction From Wikipedia Rich Documents, Hadi Amiri, Abolfazl Aleahmad, Masoud Rahgozar, Farhad Oroumchian

Farhad Oroumchian

Conceptual graph is a graph in which nodes are concepts and the edges indicate the relationship between them. Creation of conceptual graphs is a hot topic in the area of knowledge discovery. Natural Language Processing (NLP) based conceptual graph creation is one of the efficient but costly methods in the field of information extraction. Compared to NLP based methods, Statistical methods have two advantages, namely, they are language independent and more computationally efficient. In this paper we present an efficient statistical method for creating a conceptual graph from a large document collection. The documents which are used in this paper …


Cross Language Experiments At Persian@Clef 2008, Abolfazl Ale Ahmad, Ehsan Kamalloo, Arash Zareh, Masoud Rahgozar, Farhad Oroumchian Jul 2010

Cross Language Experiments At Persian@Clef 2008, Abolfazl Ale Ahmad, Ehsan Kamalloo, Arash Zareh, Masoud Rahgozar, Farhad Oroumchian

Farhad Oroumchian

In this study we will discuss our cross language text retrieval (CLIR) experiments with the Persian language at the ad hoc track at CLEF2008. Two teams from University of Tehran were involved in cross language text retrieval part of the track using two different CLIR approaches that are query translation and document translation. For query translation we used a method named Combinatorial Translation Probability (CTP) calculation for estimation of the translation probabilities. In the document translation part we used the Shiraz machine translation system for the translation of the documents into English. Then we created a Hybrid CLIR system by …


Persian Email Classification Based On Rocchio And K-Nearest Neighbor Approach, H. Bashiri, Farhad Oroumchian, A. Moeini Jul 2010

Persian Email Classification Based On Rocchio And K-Nearest Neighbor Approach, H. Bashiri, Farhad Oroumchian, A. Moeini

Farhad Oroumchian

These days, electronic mail (email) has become an essential form of communication in all aspects of ever day life. The main reason for this popularity among other things like the speed of delivery and the low cost is the convenience of managing and handling emails. However this convenience is diminishing by the growth and availability of the emails. Managing emails is becoming more difficult every day. Not only SPAM (unsolicited emails) is flooding our mailboxes but locating important and vital information among the huge number of emails that are finding their ways into our mailboxes has turned into a laborious …


Clusid: A Clustering Scheme For Intrusion Detection, Improved By Information Theory, R. Shokri, Farhad Oroumchian, N. Yazdani Jul 2010

Clusid: A Clustering Scheme For Intrusion Detection, Improved By Information Theory, R. Shokri, Farhad Oroumchian, N. Yazdani

Farhad Oroumchian

Security is a big issue for all networks in any enterprise environment. Many solutions have been proposed to secure the network infrastructure and communication over the Internet. Intrusion Detection Systems with many different techniques such as data mining approaches are employed to maximize the detection rate of intrusions while reducing false alarm rate. For instance, many clustering techniques are recommended which segregate normal and abnormal data in IDSs. Clustering methods put emphasis on finding differences and similarities of traffic sessions to categorize each one in its corresponding groups. These groups are represented by their assigned labels. Later, these labels are …


Tuning The Lambda Parameter For Language Modeling Based Persian Retrieval, Hadi Amiri, Ashkan Zarnani, Mahbod Tavallaee, Sadra Abedinzade, Farhad Oroumchian, Masoud Rahgozar Jul 2010

Tuning The Lambda Parameter For Language Modeling Based Persian Retrieval, Hadi Amiri, Ashkan Zarnani, Mahbod Tavallaee, Sadra Abedinzade, Farhad Oroumchian, Masoud Rahgozar

Farhad Oroumchian

Language modeling is one of the most powerful methods in information retrieval. Many language modeling based retrieval systems have been developed and tested on English collections. Hence, the evaluation of language modeling on collections of other languages is an interesting research issue. In this study, four different language modeling methods proposed by Hiemstra [1] have been evaluated on a large Persian collection of news archives. Furthermore, we study two different approaches that are proposed for tuning the Lambda parameter. Experimental results show that the performance of language models on Persian text improves after Lambda Tuning. More specifically Witten Bell method …


Ontology Matching Using Vector Space, Zahra Eidoon, Nasser Yazdani, Farhad Oroumchian Jul 2010

Ontology Matching Using Vector Space, Zahra Eidoon, Nasser Yazdani, Farhad Oroumchian

Farhad Oroumchian

Interoperability of heterogeneous systems on the Web will be achieved through an agreement between the underlying ontologies. Ontology matching is an operation that takes two ontologies and determines their semantic mapping. This paper presents a method of ontology matching which is based on modeling ontologies in a vector space and estimating their similarity degree by matching their concept vectors. The proposed method is successfully applied to the test suit of Ontology Alignment Evaluation Initiative 2005 [10] and compared to the results reported by other methods. In terms of precision and recall, the results look promising.


Fufair: A Fuzzy Farsi Information Retrieval System, A. Nayyeri, Farhad Oroumchian Jul 2010

Fufair: A Fuzzy Farsi Information Retrieval System, A. Nayyeri, Farhad Oroumchian

Farhad Oroumchian

Persian (Farsi) is one of the languages of Middle East. There are significant amount of Persian documents available in digital form and even more are created every day. Therefore, there is a necessity to implement Information Retrieval System with high precision for this language. This paper discusses the design, implementation and testing of a Fuzzy retrieval system for Persian called FuFaIR. This system also supports Fuzzy quantifiers in its query language. Tests have been conducted using a standard Persian test corpus called Hamshari. The performance results obtained from FuFaIR are positive and they indicate that the FuFaIR could notably outperform …


Using Owa Fuzzy Operator To Merge Retrieval System Results, H. Amiri, A. Aleahmad, Farhad Oroumchian, C. Lucas, M. Rahgozar Jul 2010

Using Owa Fuzzy Operator To Merge Retrieval System Results, H. Amiri, A. Aleahmad, Farhad Oroumchian, C. Lucas, M. Rahgozar

Farhad Oroumchian

With rapid growth of information sources, it is essential to develop methods that retrieve most relevant information according to the user requirements. One way of improving the quality of retrieval is to use more than one retrieval engine and then merge the retrieved results and show a single ranked list to the user. There are studies that suggest combining the results of multiple search engines will improve ranking when these engine are treated as independent experts. In this study, we investigated performance of Persian retrieval by merging four different language modeling methods and two vector space models with Lnu.ltu and …


Improving Persian Information Retrieval Systems Using Stemming And Part Of Speech Tagging, Reza Karimpour, Amineh Ghorbani, Azadeh Pishdad, Mitra Mohtarami, Abolfazl Ale Ahmad, Hadi Amiri, Farhad Oroumchian Jul 2010

Improving Persian Information Retrieval Systems Using Stemming And Part Of Speech Tagging, Reza Karimpour, Amineh Ghorbani, Azadeh Pishdad, Mitra Mohtarami, Abolfazl Ale Ahmad, Hadi Amiri, Farhad Oroumchian

Farhad Oroumchian

With the emergence of vast resources of information, it is necessary to develop methods that retrieve the most relevant information according to needs. These retrieval methods may benefit from natural language constructs to boost their results by achieving higher precision and recall rates. In this study, we have used part of speech properties of terms as extra source of information about document and query terms and have evaluated the impact of such data on the performance of the Persian retrieval algorithms. Also the effect of stemming has been experimented as a complement to this research. Our findings indicate that part …


Experiments With English-Persian Text Retrieval, Abolfazl Ale Ahmed, Hadi Amiri, Masoud Rahgozar, Farhad Oroumchian Jul 2010

Experiments With English-Persian Text Retrieval, Abolfazl Ale Ahmed, Hadi Amiri, Masoud Rahgozar, Farhad Oroumchian

Farhad Oroumchian

As the number of non-English documents is increasing dramatically on the web nowadays, the study and design of information retrieval systems for these languages is very important. The Persian language is the official language of Iran, Afghanistan and Tajikistan and is also spoken in some other countries in the Middle East, so there are significant amount of Persian documents available on the web. In this study, we will present and compare our English-Persian cross language text retrieval experiments on Hamshahri text collection. Also, we will present Combinatorial Translation Probability (CTP) calculation method for query translation that estimates translation probabilities based …


N-Gram And Local Context Analysis For Persian Text Retrieval, A. Aleahmad, P. Hakimian, F. Mahdikhani, Farhad Oroumchian Jul 2010

N-Gram And Local Context Analysis For Persian Text Retrieval, A. Aleahmad, P. Hakimian, F. Mahdikhani, Farhad Oroumchian

Farhad Oroumchian

The Persian language is one of the languages in Middle- East, so there are significant amount of Persian documents available on the Web. But there are relatively few studies on retrieval of Persian documents in the literature. In this experimental study, we assessed term and N-gram based vector space model and a query expansion method, namely, Local Context Analysis using different weighting schemes on a realistic corpus containing 160000+ news articles. Then we compared our results with previous works reported on Persian language. Our experimental results show that among the assessed methods, 4-gram based vector space model with Lnu.ltu weighting …


Applying And Comparing Hidden Markov Model And Fuzzy Clustering Algorithms To Web Usage Data For Recommender Systems, Shaghayegh Sahebi, Farhad Oroumchian, Ramtin Khosravi Jul 2010

Applying And Comparing Hidden Markov Model And Fuzzy Clustering Algorithms To Web Usage Data For Recommender Systems, Shaghayegh Sahebi, Farhad Oroumchian, Ramtin Khosravi

Farhad Oroumchian

In this study, we apply and compare some of the methods of usage pattern discovery, like simple k-means clustering algorithm, fuzzy relational subtractive clustering algorithm, fuzzy mean field annealing (MFA) clustering and Hidden Markov Model (HMM), for recommender systems. We use metrics like prediction strength, hit ratio, precision, prediction ability and F-Score to compare the applied methods on the Web usage data. Fuzzy MFA and HMM acted better than other methods due to fuzzy nation of human behavior in navigation and extra information utilized in sequence analysis.


A3crank: An Adaptive Ranking Method Based On Connectivity, Content And Click-Through Data, Ali M. Zareh Bidoki, Pedram Ghodsnia, Nasser Yazdani, Farhad Oroumchian Jul 2010

A3crank: An Adaptive Ranking Method Based On Connectivity, Content And Click-Through Data, Ali M. Zareh Bidoki, Pedram Ghodsnia, Nasser Yazdani, Farhad Oroumchian

Farhad Oroumchian

Due to the proliferation and abundance of information on the web, ranking algorithms play an important role in web search. Currently, there are some ranking algorithms based on content and connectivity such as PageRank and BM25. Unfortunately, these algorithms have low precision and are not always satisfying for users. In this paper, we propose an adaptive method based on the content, connectivity and click-through data triple, called A3CRank. The aggregation idea of meta search engines has been used to aggregate ranking algorithms such as PageRank, BM25, TF-IDF. We have used reinforcement learning to incorporate user behavior and find a measure …


Evaluation Of Statistical Part Of Speech Tagging Of Persian Text, Samira Tasharoft, Fahimeh Raja, Farhad Oroumchian, Masoud Rahgozar Jul 2010

Evaluation Of Statistical Part Of Speech Tagging Of Persian Text, Samira Tasharoft, Fahimeh Raja, Farhad Oroumchian, Masoud Rahgozar

Farhad Oroumchian

Part of Speech (POS) tagging is an essential part of text processing applications. A POS tagger assigns a tag to each word of its input text specifying its grammatical properties. One of the popular POS taggers is TnT tagger which was shown to have high accuracy in English and some other languages. It is always interesting to see how a method in one language performs on another language because it would give us insight into the difference and similarities of the languages. In case of statistical methods such as TnT, this will have an added practical advantages also. This paper …


Effectiveness Of Rich Document Representation In Xml Retrieval, F. Raja, M. Keikha, M. Rahgozar, Farhad Oroumchian Jul 2010

Effectiveness Of Rich Document Representation In Xml Retrieval, F. Raja, M. Keikha, M. Rahgozar, Farhad Oroumchian

Farhad Oroumchian

Information Retrieval (IR) systems are built with different goals in mind. Some IR systems target high precision that is to have more relevant documents on the first page of their results. Other systems may target high recall that is finding as many references as possible. In this paper we present a method of document representation called RDR to build XML retrieval engines with high specificity; that is finding more relevant documents that are mostly about the query topic. The Rich Document Representation (RDR) is a method of representing the content of a document with logical terms and statements. The conjecture …


Using Human Plausible Reasoning As A Framework For Multilingual Information Filtering, Asma Damankesh, Jaspreet Singh, Fatima Jahedpari, Khaled Shaalan, Farhad Oroumchian Jul 2010

Using Human Plausible Reasoning As A Framework For Multilingual Information Filtering, Asma Damankesh, Jaspreet Singh, Fatima Jahedpari, Khaled Shaalan, Farhad Oroumchian

Farhad Oroumchian

In this paper the application of the theory of Human Plausible Reasoning (HPR) has been investigated in the domain of filtering and cross language information retrieval. The theory of Human Plausible Reasoning first has been introduced by Collins and Michalski on early 1990s; it has been applied to IR since 1995. This work is an extension to those experiments which focuses on building a framework for cross language information retrieval. The system built in these experiments utilizes plausible inferences to infer new, unknown knowledge from existing knowledge to retrieve not only documents which are indexed by the query terms but …


Hamshahri: A Standard Persian Text Collection, Abolfazl Aleahmad, Hadi Amiri, Masoud Rahgozar, Farhad Oroumchian Jul 2010

Hamshahri: A Standard Persian Text Collection, Abolfazl Aleahmad, Hadi Amiri, Masoud Rahgozar, Farhad Oroumchian

Farhad Oroumchian

The Persian language is one of the dominant languages in the Middle East, so there are significant amount of Persian documents available on the Web. Due to the special and different nature of the Persian language compared to other languages like English, the design of information retrieval systems in Persian requires special considerations. However, there are relatively few studies on retrieval of Persian documents in the literature and one of the main reasons is lack of a standard test collection. In this paper we introduce a standard Persian text collection, named Hamshahri, which is built from a large number of …