Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 5 of 5

Full-Text Articles in Databases and Information Systems

Word Embedding Driven Concept Detection In Philosophical Corpora, Dylan Hayton-Ruffner Jan 2020

Word Embedding Driven Concept Detection In Philosophical Corpora, Dylan Hayton-Ruffner

Honors Projects

During the course of research, scholars often explore large textual databases for segments of text relevant to their conceptual analyses. This study proposes, develops and evaluates two algorithms for automated concept detection in theoretical corpora: ACS and WMD retrieval. Both novel algorithms are compared to key word retrieval, using a test set from the Digital Ricoeur corpus tagged by scholarly experts. WMD retrieval outperforms key word search on the concept detection task. Thus, WMD retrieval is a promising tool for concept detection and information retrieval systems focused on theoretical corpora.


User Interface Design, Moritz Stefaner, Sebastien Ferre, Saverio Perugini, Jonathan Koren, Yi Zhang Apr 2016

User Interface Design, Moritz Stefaner, Sebastien Ferre, Saverio Perugini, Jonathan Koren, Yi Zhang

Saverio Perugini

As detailed in Chap. 1, system implementations for dynamic taxonomies and faceted search allow a wide range of query possibilities on the data. Only when these are made accessible by appropriate user interfaces, the resulting applications can support a variety of search, browsing and analysis tasks. User interface design in this area is confronted with specific challenges. This chapter presents an overview of both established and novel principles and solutions.


Automatically Discovering The Number Of Clusters In Web Page Datasets, Zhongmei Yao Jan 2015

Automatically Discovering The Number Of Clusters In Web Page Datasets, Zhongmei Yao

Zhongmei Yao

Clustering is well-suited for Web mining by automatically organizing Web pages into categories, each of which contains Web pages having similar contents. However, one problem in clustering is the lack of general methods to automatically determine the number of categories or clusters. For the Web domain in particular, currently there is no such method suitable for Web page clustering. In an attempt to address this problem, we discover a constant factor that characterizes the Web domain, based on which we propose a new method for automatically determining the number of clusters in Web page data sets. We discover that the …


User Interface Design, Moritz Stefaner, Sebastien Ferre, Saverio Perugini, Jonathan Koren, Yi Zhang Jan 2009

User Interface Design, Moritz Stefaner, Sebastien Ferre, Saverio Perugini, Jonathan Koren, Yi Zhang

Computer Science Faculty Publications

As detailed in Chap. 1, system implementations for dynamic taxonomies and faceted search allow a wide range of query possibilities on the data. Only when these are made accessible by appropriate user interfaces, the resulting applications can support a variety of search, browsing and analysis tasks. User interface design in this area is confronted with specific challenges. This chapter presents an overview of both established and novel principles and solutions.


Automatically Discovering The Number Of Clusters In Web Page Datasets, Zhongmei Yao Jun 2005

Automatically Discovering The Number Of Clusters In Web Page Datasets, Zhongmei Yao

Computer Science Faculty Publications

Clustering is well-suited for Web mining by automatically organizing Web pages into categories, each of which contains Web pages having similar contents. However, one problem in clustering is the lack of general methods to automatically determine the number of categories or clusters. For the Web domain in particular, currently there is no such method suitable for Web page clustering. In an attempt to address this problem, we discover a constant factor that characterizes the Web domain, based on which we propose a new method for automatically determining the number of clusters in Web page data sets. We discover that the …