Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Information retrieval

Discipline
Institution
Publication Year
Publication
Publication Type

Articles 1 - 29 of 29

Full-Text Articles in Computer Engineering

Automatic Keyword Assignment System For Medical Research Articles Using Nearest-Neighbor Searches, Fati̇h Di̇lmaç, Adi̇l Alpkoçak Jul 2022

Automatic Keyword Assignment System For Medical Research Articles Using Nearest-Neighbor Searches, Fati̇h Di̇lmaç, Adi̇l Alpkoçak

Turkish Journal of Electrical Engineering and Computer Sciences

Assigning accurate keywords to research articles is increasingly important concern. Keywords should be selected meticulously to describe the article well since keywords play an important role in matching readers with research articles in order to reach a bigger audience. So, improper selection of keywords may result in less attraction to readers which results in degradation in its audience. Hence, we designed and developed an automatic keyword assignment system (AKAS) for research articles based on k-nearest neighbor (k-NN) and threshold-nearest neighbor (t-NN) accompanied with information retrieval systems (IRS), which is a corpus-based method by utilizing IRS using the Medline dataset in …


Learning Term Weights By Overfitting Pairwise Ranking Loss, Ömer Şahi̇n, İlyas Çi̇çekli̇, Gönenç Ercan Jul 2022

Learning Term Weights By Overfitting Pairwise Ranking Loss, Ömer Şahi̇n, İlyas Çi̇çekli̇, Gönenç Ercan

Turkish Journal of Electrical Engineering and Computer Sciences

A search engine strikes a balance between effectiveness and efficiency to retrieve the best documents in a scalable way. Recent deep learning-based ranker methods are proving to be effective and improving the state-of-the-art in relevancy metrics. However, as opposed to index-based retrieval methods, neural rankers like bidirectional encoder representations from transformers (BERT) do not scale to large datasets. In this article, we propose a query term weighting method that can be used with a standard inverted index without modifying it. Query term weights are learned using relevant and irrelevant document pairs for each query, using a pairwise ranking loss. The …


Changeset-Based Retrieval Of Source Code Artifacts For Bug Localization, Agnieszka Ciborowska Jan 2022

Changeset-Based Retrieval Of Source Code Artifacts For Bug Localization, Agnieszka Ciborowska

Theses and Dissertations

Modern software development is extremely collaborative and agile, with unprecedented speed and scale of activity. Popular trends like continuous delivery and continuous deployment aim at building, fixing, and releasing software with greater speed and frequency. Bug localization, which aims to automatically localize bug reports to relevant software artifacts, has the potential to improve software developer efficiency by reducing the time spent on debugging and examining code. To date, this problem has been primarily addressed by applying information retrieval techniques based on static code elements, which are intrinsically unable to reflect how software evolves over time. Furthermore, as prior approaches frequently …


Information Retrieval-Based Bug Localization Approach With Adaptive Attributeweighting, Mustafa Erşahi̇n, Semi̇h Utku, Deni̇z Kilinç, Buket Erşahi̇n Jan 2021

Information Retrieval-Based Bug Localization Approach With Adaptive Attributeweighting, Mustafa Erşahi̇n, Semi̇h Utku, Deni̇z Kilinç, Buket Erşahi̇n

Turkish Journal of Electrical Engineering and Computer Sciences

Software quality assurance is one of the crucial factors for the success of software projects. Bug fixing has an essential role in software quality assurance, and bug localization (BL) is the first step of this process. BL is difficult and time-consuming since the developers should understand the flow, coding structure, and the logic of the program. Information retrieval-based bug localization (IRBL) uses the information of bug reports and source code to locate the section of code in which the bug occurs. It is difficult to apply other tools because of the diversity of software development languages, design patterns, and development …


Urban Mobile Augmented Reality Method Based On Self Organization Of Multi-Modal Data, Wanxian Guan, Wanpeng Li, Ge Lin, Huagen Wan Jun 2020

Urban Mobile Augmented Reality Method Based On Self Organization Of Multi-Modal Data, Wanxian Guan, Wanpeng Li, Ge Lin, Huagen Wan

Journal of System Simulation

Abstract: In recent years, urban mobile augmented reality has important application prospects. However, with the large-scale scenes in the cities, how to collect and reasonably manage and organize as well as efficiently use the big data sets has become an important issue for the performance of mobile augmented reality. To solve this problem, this paper proposed an urban mobile augmented reality method based on the free organization of multi-modal data, with the application scenario of urban streetscape enhancement. The proposed method takes the multi-modal data as the centric context and aims at how to acquire and process raw data effectively, …


A Content-Based Recommender System For Choosing Universities, Miftahul Jannat Mokarrama, Sumi Khatun, Mohammad Shamsul Arefin Jan 2020

A Content-Based Recommender System For Choosing Universities, Miftahul Jannat Mokarrama, Sumi Khatun, Mohammad Shamsul Arefin

Turkish Journal of Electrical Engineering and Computer Sciences

Recommender system (RS) is a knowledge discovery and decision-making system that has been extensively used in a myriad of applications to assist people in making distinct choices from vast sources. This paper proposes a recommendation system that will help the prospective students of Bangladesh in choosing the most suitable private universities for getting admission. Since selecting the best private university does not depend merely on a few criteria or choices and making a decision considering all those criteria is not an easy task, a recommendation system can be of great assistance in this scenario for the prospective students. In this …


Asap: A Source Code Authorship Program, Matthew F. Tennyson Phd Aug 2019

Asap: A Source Code Authorship Program, Matthew F. Tennyson Phd

Faculty & Staff Research and Creative Activity

Source code authorship attribution is the task of determining who wrote a computer program, based on its source code, usually when the author is either unknown or under dispute. Areas where this can be applied include software forensics, cases of software copyright infringement, and detecting plagiarism. Numerous methods of source code authorship attribution have been proposed and studied. However, there are no known easily accessible and user-friendly programs that perform this task. Instead, researchers typically develop software in an ad hoc manner for use in their studies, and the software is rarely made publicly available. In this paper, we present …


Refugees' Social Media Activities In Turkey: A Computational Analysis And Demonstration Method, Muhammed Abdullah Bülbül, Salah Haj Ismail Jan 2019

Refugees' Social Media Activities In Turkey: A Computational Analysis And Demonstration Method, Muhammed Abdullah Bülbül, Salah Haj Ismail

Turkish Journal of Electrical Engineering and Computer Sciences

This study performs a data analysis on refugees in Turkey based on their social media activities. In order to achieve this, we first propose a method to find their relevant public accounts and collect their activities generating a dataset. Then, we perform spatial and temporal analysis over this dataset to shed light on the most important topics and events discussed in social networks. We present the results graphically for ease of understanding and comparison. Our results indicate that we can reveal the most shared topics over a specific time and place as well as the change of pattern in refugees' …


Query Expansion Techniques For Enterprise Search, Eric M. Domke Dec 2017

Query Expansion Techniques For Enterprise Search, Eric M. Domke

Masters Theses

Although web search remains an active research area, interest in enterprise search has waned. This is despite the fact that the market for enterprise search applications is expected to triple within the next six years, and that knowledge workers spend an average of 1.6 to 2.5 hours each day searching for information. To improve search relevancy, and hence reduce this time, an enterprise- focused application must be able to handle the unique queries and constraints of the enterprise environment. The goal of this thesis research was to develop, implement, and study query expansion techniques that are most effective at improving …


Using Latent Semantic Analysis For Automated Keyword Extraction From Large Document Corpora, Tuğba Önal Süzek Jan 2017

Using Latent Semantic Analysis For Automated Keyword Extraction From Large Document Corpora, Tuğba Önal Süzek

Turkish Journal of Electrical Engineering and Computer Sciences

In this study, we describe a keyword extraction technique that uses latent semantic analysis (LSA) to identify semantically important single topic words or keywords. We compare our method against two other automated keyword extractors, Tf-idf (term frequency-inverse document frequency) and Metamap, using human-annotated keywords as a reference. Our results suggest that the LSA-based keyword extraction method performs comparably to the other techniques. Therefore, in an incremental update setting, the LSA-based keyword extraction method can be preferably used to extract keywords from text descriptions from big data when compared to existing keyword extraction methods.


Detecting, Modeling, And Predicting User Temporal Intention, Hany M. Salaheldeen Jul 2015

Detecting, Modeling, And Predicting User Temporal Intention, Hany M. Salaheldeen

Computer Science Theses & Dissertations

The content of social media has grown exponentially in the recent years and its role has evolved from narrating life events to actually shaping them. Unfortunately, content posted and shared in social networks is vulnerable and prone to loss or change, rendering the context associated with it (a tweet, post, status, or others) meaningless. There is an inherent value in maintaining the consistency of such social records as in some cases they take over the task of being the first draft of history as collections of these social posts narrate the pulse of the street during historic events, protest, riots, …


Source Code Retrieval From Large Software Libraries For Automatic Bug Localization, Bunyamin Sisman Oct 2013

Source Code Retrieval From Large Software Libraries For Automatic Bug Localization, Bunyamin Sisman

Open Access Dissertations

This dissertation advances the state-of-the-art in information retrieval (IR) based approaches to automatic bug localization in software. In an IR-based approach, one first creates a search engine using a probabilistic or a deterministic model for the files in a software library. Subsequently, a bug report is treated as a query to the search engine for retrieving the files relevant to the bug. With regard to the new work presented, we first demonstrate the importance of taking version histories of the files into account for achieving significant improvements in the precision with which the files related to a bug are located. …


Semi-Automatic Simulation Initialization By Mining Structured And Unstructured Data Formats From Local And Web Data Sources, Olcay Sahin Oct 2012

Semi-Automatic Simulation Initialization By Mining Structured And Unstructured Data Formats From Local And Web Data Sources, Olcay Sahin

Computational Modeling & Simulation Engineering Theses & Dissertations

Initialization is one of the most important processes for obtaining successful results from a simulation. However, initialization is a challenge when 1) a simulation requires hundreds or even thousands of input parameters or 2) re-initializing the simulation due to different initial conditions or runtime errors. These challenges lead to the modeler spending more time initializing a simulation and may lead to errors due to poor input data.

This thesis proposes two semi-automatic simulation initialization approaches that provide initialization using data mining from structured and unstructured data formats from local and web data sources. First, the System Initialization with Retrieval (SIR) …


Investigation Of Luhn's Claim On Information Retrieval, İlker Kocabaş, Beki̇r Taner Di̇nçer, Bahar Karaoğlan Jan 2011

Investigation Of Luhn's Claim On Information Retrieval, İlker Kocabaş, Beki̇r Taner Di̇nçer, Bahar Karaoğlan

Turkish Journal of Electrical Engineering and Computer Sciences

In this study, we show how Luhn's claim about the degree of importance of a word in a document can be related to information retrieval. His basic idea is transformed into z-scores as the weights of terms for the purpose of modeling term frequency (tf) within documents. The Luhn-based models represented in this paper are considered as the TF component of proposed TF \times IDF weighing schemes. Moreover, the final term weighting functions appropriate for the TF \times IDF weighting scheme are applied to TREC-6, -7, and -8 databases. The experimental results show relevance to Luhn's claim by having high …


Use Of Negation In Search, Kristen M. Lancaster Jun 2010

Use Of Negation In Search, Kristen M. Lancaster

Theses and Dissertations

Boolean algebra was developed in the 1840s. Since that time, negation, one of the three basic concepts in Boolean algebra, has influenced the fields of information science and information retrieval, particularly in the modern computer era. In Web search engines, one of the present manifestations of information retrieval, little use is being made of this functionality and so little attention is given to it in the literature. This study aims to bolster the understanding of the use and usefulness of negation. Specifically, an Internet search task was developed for which negation was the most appropriate search strategy. This search task …


Augmenting Latent Dirichlet Allocation And Rank Threshold Detection With Ontologies, Laura A. Isaly Mar 2010

Augmenting Latent Dirichlet Allocation And Rank Threshold Detection With Ontologies, Laura A. Isaly

Theses and Dissertations

In an ever-increasing data rich environment, actionable information must be extracted, filtered, and correlated from massive amounts of disparate often free text sources. The usefulness of the retrieved information depends on how we accomplish these steps and present the most relevant information to the analyst. One method for extracting information from free text is Latent Dirichlet Allocation (LDA), a document categorization technique to classify documents into cohesive topics. Although LDA accounts for some implicit relationships such as synonymy (same meaning) it often ignores other semantic relationships such as polysemy (different meanings), hyponym (subordinate), meronym (part of), and troponomys (manner). To …


Efficient Storage And Domain-Specific Information Discovery On Semistructured Documents, Fernando R. Farfan Nov 2009

Efficient Storage And Domain-Specific Information Discovery On Semistructured Documents, Fernando R. Farfan

FIU Electronic Theses and Dissertations

The increasing amount of available semistructured data demands efficient mechanisms to store, process, and search an enormous corpus of data to encourage its global adoption. Current techniques to store semistructured documents either map them to relational databases, or use a combination of flat files and indexes. These two approaches result in a mismatch between the tree-structure of semistructured data and the access characteristics of the underlying storage devices. Furthermore, the inefficiency of XML parsing methods has slowed down the large-scale adoption of XML into actual system implementations. The recent development of lazy parsing techniques is a major step towards improving …


Lightweight Federation Of Non-Cooperating Digital Libraries, Rong Shi Apr 2005

Lightweight Federation Of Non-Cooperating Digital Libraries, Rong Shi

Computer Science Theses & Dissertations

This dissertation studies the challenges and issues faced in federating heterogeneous digital libraries (DLs). The objective of this research is to demonstrate the feasibility of interoperability among non-cooperating DLs by presenting a lightweight, data driven approach, or Data Centered Interoperability (DCI). We build a Lightweight Federated Digital Library (LFDL) system to provide federated search service for existing digital libraries with no prior coordination.

We describe the motivation, architecture, design and implementation of the LFDL. We develop, deploy, and evaluate key services of the federation. The major difference to existing DL interoperability approaches is one where we do not insist on …


Probability And Agents, Marco Valtorta, Michael N. Huhns Jan 2001

Probability And Agents, Marco Valtorta, Michael N. Huhns

Faculty Publications

To make sense of the information that agents gather from the Web, they need to reason about it. If the information is precise and correct, they can use engines such as theorem provers to reason logically and derive correct conclusions. Unfortunately, the information is often imprecise and uncertain, which means they will need a probabilistic approach. More than 150 years ago, George Boole presented the logic that bears his name. There is concern that classical logic is not sufficient to model how people do or should reason. Adopting a probabilistic approach in constructing software agents and multiagent systems simplifies some …


Buckets: Smart Objects For Digital Libraries, Michael L. Nelson Jan 2001

Buckets: Smart Objects For Digital Libraries, Michael L. Nelson

Computer Science Faculty Publications

Current discussion of digital libraries (DLs) is often dominated by the merits of the respective storage, search and retrieval functionality of archives, repositories, search engines, search interfaces and database systems. While these technologies are necessary for information management, the information content is more important than the systems used for its storage and retrieval. Digital information should have the same long-term survivability prospects as traditional hardcopy information and should be protected to the extent possible from evolving search engine technologies and vendor vagaries in database management systems. Information content and information retrieval systems should progress on independent paths and make limited …


Consensus Ontologies: Reconciling The Semantics Of Web Pages And Agents, Larry M. Stevens, Michael N. Huhns Jan 2001

Consensus Ontologies: Reconciling The Semantics Of Web Pages And Agents, Larry M. Stevens, Michael N. Huhns

Faculty Publications

As you build a Web site, it is worthwhile asking, "Should I put my information where it belongs or where people are most likely to look for it?" Our recent research into improving searching through ontologies is providing some interesting results to answer this question. The techniques developed by our research bring organization to the information received and reconcile the semantics of each document. Our goal is to help users retrieve dynamically generated information that is tailored to their individual needs and preferences. We believe that it is easier for individuals or small groups to develop their own ontologies, regardless …


Architectural Optimization Of Digital Libraries, Aileen O. Biser Aug 1998

Architectural Optimization Of Digital Libraries, Aileen O. Biser

Computer Science Theses & Dissertations

This work investigates performance and scaling issues relevant to large scale distributed digital libraries. Presently, performance and scaling studies focus on specific implementations of production or prototype digital libraries. Although useful information is gained to aid these designers and other researchers with insights to performance and scaling issues, the broader issues relevant to very large scale distributed libraries are not addressed. Specifically, no current studies look at the extreme or worst case possibilities in digital library implementations. A survey of digital library research issues is presented. Scaling and performance issues are mentioned frequently in the digital library literature but are …


Creating A Canonical Scientific And Technical Information Classification System For Ncstrl+, Melissa E. Tiffany, Michael L. Nelson Jan 1998

Creating A Canonical Scientific And Technical Information Classification System For Ncstrl+, Melissa E. Tiffany, Michael L. Nelson

Computer Science Faculty Publications

The purpose of this paper is to describe the new subject classification system for the NCSTRL+ project. NCSTRL+ is a canonical digital library (DL) based on the Networked Computer Science Technical Report Library (NCSTRL). The current NCSTRL+ classification system uses the NASA Scientific and Technical (STI) subject classifications, which has a bias towards the aerospace, aeronautics, and engineering disciplines. Examination of other scientific and technical information classification systems showed similar discipline-centric weaknesses. Traditional, library-oriented classification systems represented all disciplines, but were too generalized to serve the needs of a scientific and technically oriented digital library. Lack of a suitable existing …


A Performance Analysis Of The Faugeras Color Space As A Component Of Color Histogram-Based Image Retrieval, Chad A. Vander Meer Dec 1997

A Performance Analysis Of The Faugeras Color Space As A Component Of Color Histogram-Based Image Retrieval, Chad A. Vander Meer

Theses and Dissertations

The use of color histograms for image retrieval from databases has been implemented in many variations. Selecting the appropriate color space for similarity comparisons is an important part of a color histogram technique. This paper serves to introduce and evaluate the performance of a color space through the use of color histograms. Performance is evaluated by correlating the similarity results obtained from various color feature vector techniques (including color histgramming) to those gathered through a human perceptual test. The perceptual test required 36 human subjects to evaluate the similarity of 10 military aircraft images. The same 10 images were also …


Lyceum: A Multi-Protocol Digital Library Gateway, Ming-Hokng Maa, Michael L. Nelson, Sandra L. Esler Jan 1997

Lyceum: A Multi-Protocol Digital Library Gateway, Ming-Hokng Maa, Michael L. Nelson, Sandra L. Esler

Computer Science Faculty Publications

Lyceum is a prototype scalable query gateway that provides a logically central interface to multi-protocol and physically distributed, digital libraries of scientific and technical information. Lyceum processes queries to multiple syntactically distinct search engines used by various distributed information servers from a single logically central interface without modification of the remote search engines. A working prototype (http://www.larc.nasa.gov/lyceum/) demonstrates the capabilities, potentials, and advantages of this type of meta-search engine by providing access to over 50 servers covering over 20 disciplines.


The Agent Test, Michael N. Huhns, Munindar P. Singh Jan 1997

The Agent Test, Michael N. Huhns, Munindar P. Singh

Faculty Publications

The authors consider agents on the World Wide Web, including information retrieval agents. They propose a test for agenthood, involving communication in multi-agent systems.


Electronic Document Distribution: Design Of The Anonymous Ftp Langley Technical Report Server, Michael L. Nelson, Gretchen L. Gottlich Jan 1994

Electronic Document Distribution: Design Of The Anonymous Ftp Langley Technical Report Server, Michael L. Nelson, Gretchen L. Gottlich

Computer Science Faculty Publications

An experimental electronic dissemination project, the Langley Technical Report Server (LTRS), has been undertaken to determine the feasibility of delivering Langley technical reports directly to the desktops of researchers worldwide. During the first six months, over 4700 accesses occurred and over 2400 technical reports were distributed. This usage indicates the high level of interest that researchers have in performing literature searches and retrieving technical reports at their desktops. The initial system was developed with existing resources and technology. The reports are stored as files on an inexpensive UNIX workstation and are accessible over the Internet. This project will serve as …


World Wide Web Implementation Of The Langley Technical Report Server, Michael L. Nelson, Gretchen L. Gottlich, David J. Bianco Jan 1994

World Wide Web Implementation Of The Langley Technical Report Server, Michael L. Nelson, Gretchen L. Gottlich, David J. Bianco

Computer Science Faculty Publications

On January 14, 1993, NASA Langley Research Center (LaRC) made approximately 130 formal, 'unclassified, unlimited' technical reports available via the anonymous FTP Langley Technical Report Server (LTRS). LaRC was the first organization to provide a significant number of aerospace technical reports for open electronic dissemination. LTRS has been successful in its first 18 months of operation, with over 11,000 reports distributed and has helped lay the foundation for electronic document distribution for NASA. The availability of World Wide Web (WWW) technology has revolutionized the Internet-based information community. This paper describes the transition of LTRS from a centralized FTP site to …


Reference Retrieval Based On User Induced Dynamic Clustering., Robert N. Oddy Dec 1974

Reference Retrieval Based On User Induced Dynamic Clustering., Robert N. Oddy

Robert Oddy

The problem of mechanically retrieving references to documents, as a first step to fulfilling the information need of a researcher, is tackled through the design of an interactive computer program. A view of reference retrieval is presented which embraces the browsing activity. In fact, browsing is considered important and regarded as ubiquitous. Thus, for successful retrieval (in many circumstances), a device which permits conversation is needed. Approaches to automatic (delegated) retrieval are surveyed, as are on-line systems which support interaction. This type of interaction usually consists of iteration, under the user's control, in the query formulation process. A program has …