Open Access. Powered by Scholars. Published by Universities.®

Library and Information Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Syracuse University

School of Information Studies - Faculty Scholarship

Text categorization

Discipline
Publication Year

Articles 1 - 3 of 3

Full-Text Articles in Library and Information Science

Collecting Legacy Corpora From Social Science Research For Text Mining Evaluation, Bei Yu, Min-Chun Ku Oct 2010

Collecting Legacy Corpora From Social Science Research For Text Mining Evaluation, Bei Yu, Min-Chun Ku

School of Information Studies - Faculty Scholarship

In this poster we describe a pilot study of searching social science literature for legacy corpora to evaluate text mining algorithms. The new emerging field of computational social science demands large amount of social science data to train and evaluate computational models. We argue that the legacy corpora that were annotated by social science researchers through traditional Qualitative Data Analysis (QDA) are ideal data sets to evaluate text mining methods, such as text categorization and clustering. As a pilot study, we searched articles that involve content analysis and discourse analysis in leading communication journals, and then contacted the authors regarding …


Exploring The Characteristics Of Opinion Expressions For Political Opinion Classification, Bei Yu, Stefan Kaufmann, Daniel Diermeier May 2008

Exploring The Characteristics Of Opinion Expressions For Political Opinion Classification, Bei Yu, Stefan Kaufmann, Daniel Diermeier

School of Information Studies - Faculty Scholarship

Recently there has been increasing interest in constructing general-purpose political opinion classifiers for applications in e-Rulemaking. This problem is generally modeled as a sentiment classification task in a new domain. However, the classification accuracy is not as good as that in other domains such as customer reviews. In this paper, we report the results of a series of experiments designed to explore the characteristics of political opinion expression which might affect the sentiment classification performance. We found that the average sentiment level of Congressional debate is higher than that of neutral news articles, but lower than that of movie reviews. …


Improved Document Representation For Classification Tasks For The Intelligence Community, Elizabeth D. Liddy, Ozgur Yilmazel, Svetlana Symonenko, Niranjan Balasubramanian Jan 2005

Improved Document Representation For Classification Tasks For The Intelligence Community, Elizabeth D. Liddy, Ozgur Yilmazel, Svetlana Symonenko, Niranjan Balasubramanian

School of Information Studies - Faculty Scholarship

This research addresses the question of whether the AI technologies of Natural Language Processing (NLP) and Machine Learning (ML) can be used to improve security within the Intelligence Community (IC).