Open Access. Powered by Scholars. Published by Universities.®

Library and Information Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 18 of 18

Full-Text Articles in Library and Information Science

Assessing Topical Homogeneity With Word Embedding And Distance Matrices, Jeffrey M. Stanton, Yisi Sang Oct 2020

Assessing Topical Homogeneity With Word Embedding And Distance Matrices, Jeffrey M. Stanton, Yisi Sang

School of Information Studies - Faculty Scholarship

Researchers from many fields have used statistical tools to make sense of large bodies of text. Many tools support quantitative analysis of documents within a corpus, but relatively few studies have examined statistical characteristics of whole corpora. Statistical summaries of whole corpora and comparisons between corpora have potential application in the analysis of topically organized applications such social media platforms. In this study, we created matrix representations of several corpora and examined several statistical tests to make comparisons between pairs of corpora with respect to the topical homogeneity of documents within each corpus. Results of three experiments suggested that a …


Collecting Legacy Corpora From Social Science Research For Text Mining Evaluation, Bei Yu, Min-Chun Ku Oct 2010

Collecting Legacy Corpora From Social Science Research For Text Mining Evaluation, Bei Yu, Min-Chun Ku

School of Information Studies - Faculty Scholarship

In this poster we describe a pilot study of searching social science literature for legacy corpora to evaluate text mining algorithms. The new emerging field of computational social science demands large amount of social science data to train and evaluate computational models. We argue that the legacy corpora that were annotated by social science researchers through traditional Qualitative Data Analysis (QDA) are ideal data sets to evaluate text mining methods, such as text categorization and clustering. As a pilot study, we searched articles that involve content analysis and discourse analysis in leading communication journals, and then contacted the authors regarding …


A Longitudinal Study Of Language And Ideology In Congress, Bei Yu, Daniel Diermeier Apr 2010

A Longitudinal Study Of Language And Ideology In Congress, Bei Yu, Daniel Diermeier

School of Information Studies - Faculty Scholarship

This paper presents an analysis of the legislative speech records from the 101st-108th U.S. Congresses using machine learning and natural language processing methods. We use word vectors to represent the speeches in both the Senate and the House, and then use text categorization methods to classify the speakers by their ideological positions. The classification accuracy indicates the level of distinction between the liberal and the conservative ideologies. Our experiment results demonstrate an increasing partisanship in the Congress between 1989 and 2006. Ideology classifiers trained on the House speeches can predict the Senators' ideological positions well (House-to-Senate prediction), however the Senate-to-House …


Questions To Be Asked & Answered On Nlp’S Role In Improving Semantic Annotation For Ir, Elizabeth D. Liddy Jan 2010

Questions To Be Asked & Answered On Nlp’S Role In Improving Semantic Annotation For Ir, Elizabeth D. Liddy

School of Information Studies - Faculty Scholarship

What was early information retrieval like? (before it was called search!)

How was NLP first applied to the task?

Which levels of language analysis were utilized?

Which were successful? Which were not?

Why were other levels not incorporated ?

Do we now see that the higher levels can and need to be included?

If they are, how might they change how we do IR, as well as what tasks we use it for?


An Evaluation Of Text Classification Methods For Literary Study, Bei Yu Jan 2008

An Evaluation Of Text Classification Methods For Literary Study, Bei Yu

School of Information Studies - Faculty Scholarship

This article presents an empirical evaluation of text classification methods in literary domain. This study compared the performance of two popular algorithms, naı¨ve Bayes and support vector machines (SVMs) in two literary text classification tasks: the eroticism classification of Dickinson’s poems and the sentimentalism classification of chapters in early American novels. The algorithms were also combined with three text pre-processing tools, namely stemming, stopword removal, and statistical feature selection, to study the impact of these tools on the classifiers’ performance in the literary setting. Existing studies outside the literary domain indicated that SVMs are generally better than naı¨ve Bayes classifiers. …


Certainty Identification In Texts: Categorization Model And Manual Tagging Results, Elizabeth D. Liddy, Victoria L. Rubin, Noriko Kando Jan 2006

Certainty Identification In Texts: Categorization Model And Manual Tagging Results, Elizabeth D. Liddy, Victoria L. Rubin, Noriko Kando

School of Information Studies - Faculty Scholarship

This chapter presents a theoretical framework and preliminary results for manual categorization of explicit certainty information in 32 English newspaper articles. Our contribution is in a proposed categorization model and analytical framework for certainty identification. Certainty is presented as a type of subjective information available in texts. Statements with explicit certainty markers were identified and categorized according to four hypothesized dimensions – level, perspective, focus, and time of certainty.

The preliminary results reveal an overall promising picture of the presence of certainty information in texts, and establish its susceptibility to manual identification within the proposed four-dimensional certainty categorization analytical framework. …


Improved Document Representation For Classification Tasks For The Intelligence Community, Elizabeth D. Liddy, Ozgur Yilmazel, Svetlana Symonenko, Niranjan Balasubramanian Jan 2005

Improved Document Representation For Classification Tasks For The Intelligence Community, Elizabeth D. Liddy, Ozgur Yilmazel, Svetlana Symonenko, Niranjan Balasubramanian

School of Information Studies - Faculty Scholarship

This research addresses the question of whether the AI technologies of Natural Language Processing (NLP) and Machine Learning (ML) can be used to improve security within the Intelligence Community (IC).


Hands-On Nlp For An Interdisciplinary Audience, Elizabeth D. Liddy, Nancy Mccracken Jan 2005

Hands-On Nlp For An Interdisciplinary Audience, Elizabeth D. Liddy, Nancy Mccracken

School of Information Studies - Faculty Scholarship

The need for a single NLP offering for a diverse mix of graduate students (including computer scientists, information scientists, and linguists) has motivated us to develop a course that provides students with a breadth of understanding of the scope of real world applications, as well as depth of knowledge of the computational techniques on which to build in later experiences. We describe the three hands-on tasks for the course that have proven successful, namely: 1) in-class group simulations of computational processes; 2) team posters and public presentations on state-of-the-art commercial NLP applications, and; 3) team projects implementing various levels of …


Document Retrieval, Automatic, Elizabeth D. Liddy Jan 2005

Document Retrieval, Automatic, Elizabeth D. Liddy

School of Information Studies - Faculty Scholarship

Document Retrieval is the computerized process of producing a relevance ranked list of documents in response to an inquirer’s request by comparing their request to an automatically produced index of the documents in the system. Everyone uses such systems today in the form of web-based search engines. While evolving from a fairly small discipline in the 1940s, to a large, profitable industry today, the field has maintained a healthy research focus, supported by test collections and large-scale annual comparative tests of systems. A document retrieval system is comprised of three core modules: document processor, query analyzer, and matching function. There …


Discerning Emotions In Texts, Victoria L. Rubin, Jeffrey M. Stanton, Elizabeth D. Liddy Jan 2004

Discerning Emotions In Texts, Victoria L. Rubin, Jeffrey M. Stanton, Elizabeth D. Liddy

School of Information Studies - Faculty Scholarship

We present an empirically verified model of discernable emotions, Watson and Tellegen’s Circumplex Theory of Affect from social and personality psychology, and suggest its usefulness in NLP as a potential model for an automation of an eight-fold categorization of emotions in written English texts. We developed a data collection tool based on the model, collected 287 responses from 110 non-expert informants based on 50 emotional excerpts (min=12, max=348, average=86 words), and analyzed the inter-coder agreement per category and per strength of ratings per sub-category. The respondents achieved an average 70.7% agreement in the most commonly identified emotion categories per text. …


Context-Based Question-Answering Evaluation, Elizabeth D. Liddy, Anne R. Diekema, Ozgur Yilmazel Jan 2004

Context-Based Question-Answering Evaluation, Elizabeth D. Liddy, Anne R. Diekema, Ozgur Yilmazel

School of Information Studies - Faculty Scholarship

In this poster, we will present the results of efforts we have undertaken to conduct evaluations of a QA system in a real world environment and to understand the nature of the dimensions on which users evaluate QA systems when given full reign to comment on whatever dimensions they deem important.


Certainty Categorization Model, Elizabeth D. Liddy, Noriko Kando, Victoria L. Rubin Jan 2004

Certainty Categorization Model, Elizabeth D. Liddy, Noriko Kando, Victoria L. Rubin

School of Information Studies - Faculty Scholarship

We present a theoretical framework and preliminary results for manual categorization of explicit certainty information in 32 English newspaper articles. The explicit certainty markers were identified and categorized according to the four hypothesized dimensions – perspective, focus, timeline, and level of certainty. One hundred twenty one sentences from sample news stories contained a significantly lower frequency of markers per sentence (M=0.46, SD =0.04) than 564 sentences from sample editorials (M=0.6, SD =0.23), p= 0.0056, two-tailed heteroscedastic t-test. Within each dimension, editorials had most numerous markers per sentence in high level of certainty, writer’s point of view, and future and present …


What Do You Mean? Finding Answers To Complex Questions, Anne R. Diekema, Ozgur Yilmazel, Jiangping Chen, Sarah Harwell, Elizabeth D. Liddy, Lan He Jan 2003

What Do You Mean? Finding Answers To Complex Questions, Anne R. Diekema, Ozgur Yilmazel, Jiangping Chen, Sarah Harwell, Elizabeth D. Liddy, Lan He

School of Information Studies - Faculty Scholarship

This paper illustrates ongoing research and issues faced when dealing with real-time questions in the domain of Reusable Launch Vehicles (aerospace engineering). The question- answering system described in this paper is used in a collaborative learning environment with real users and live questions. The paper describes an analysis of these more complex questions as well as research to include the user in the question-answering process by implementing a question negotiation module based on the traditional reference interview.


Transformation Based Learning For Specialization Of Generic Event Extractions, Mary D. Taffet, Nancy Mccracken, Eileen Allen, Elizabeth D. Liddy Dec 2002

Transformation Based Learning For Specialization Of Generic Event Extractions, Mary D. Taffet, Nancy Mccracken, Eileen Allen, Elizabeth D. Liddy

School of Information Studies - Faculty Scholarship

As part of our Evidence Extraction and Link Discovery (EELD) project, we proposed to use Transformation Based Learning (TBL) to learn domain-specific specializations for generic event extractions. The primary goal of our learning task was to reduce the amount of human effort required for specializing generic event extractions to domains that are new and specific. Three initial annotation cycles and one annotation review and correction cycle involving a total of 70 documents were completed, with slightly over 32 hours required for the entire annotation effort; where possible, the annotation cycles started with bootstrapped files resulting from the application of TBL …


A Breadth Of Nlp Applications, Elizabeth D. Liddy Jan 2002

A Breadth Of Nlp Applications, Elizabeth D. Liddy

School of Information Studies - Faculty Scholarship

The Center for Natural Language Processing (CNLP) was founded in September 1999 in the School of Information Studies, the “Original Information School”, at Syracuse University. CNLP’s mission is to advance the development of human-like, language understanding software capabilities for government, commercial, and consumer applications. The Center conducts both basic and applied research, building on its recognized capabilities in Natural Language Processing. The Center’s seventeen employees are a mix of doctoral students in information science or computer engineering, software engineers, linguistic analysts, and research engineers.


Natural Language Processing, Elizabeth D. Liddy Jan 2001

Natural Language Processing, Elizabeth D. Liddy

School of Information Studies - Faculty Scholarship

Natural Language Processing (NLP) is the computerized approach to analyzing text that is based on both a set of theories and a set of technologies. And, being a very active area of research and development, there is not a single agreed-upon definition that would satisfy everyone, but there are some aspects, which would be part of any knowledgeable person’s definition.


Searching And Search Engines: When Is Current Research Going To Lead To Major Progress?, Elizabeth D. Liddy Jan 2000

Searching And Search Engines: When Is Current Research Going To Lead To Major Progress?, Elizabeth D. Liddy

School of Information Studies - Faculty Scholarship

For many years, users of commercial search engines have been hearing how the latest in information and computer science research is going to improve the quality of the engines they rely on for fulfilling their daily information needs. However, despite what is heard, these promises have not been fulfilled. While the Internet has dramatically increased the amount of information to which users now have access, the key issue appears to be unresolved – the results for substantive queries are not improving. However, the past need not predict the future because sophisticated advances in Natural Language Processing (NLP) have, in fact, …


Dr-Link: A System Update For Trec-2, Elizabeth D. Liddy, Sung H. Myaeng Jan 1994

Dr-Link: A System Update For Trec-2, Elizabeth D. Liddy, Sung H. Myaeng

School of Information Studies - Faculty Scholarship

The theoretical goal underlying the DR-LINK System is to represent and match documents and queries at the various linguistic levels at which human language conveys meaning. Accordingly, we have developed a modular system which processes and represents text at the lexical, syntactic, semantic, and discourse levels of language. In concert, these levels of processing permit DR-LINK to achieve a level of intelligent retrieval beyond more traditional approaches. In addition, the rich annotations to text produced by DR-LINK are replete with much of the semantics necessary for document extraction. The system was planned and developed in a modular fashion and functional …