Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 4 of 4

Full-Text Articles in Computational Linguistics

Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang Feb 2016

Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang

COBRA Preprint Series

Non-negative matrix factorization (NMF) is a widely used machine learning algorithm for dimension reduction of large-scale data. It has found successful applications in a variety of fields such as computational biology, neuroscience, natural language processing, information retrieval, image processing and speech recognition. In bioinformatics, for example, it has been used to extract patterns and profiles from genomic and text-mining data as well as in protein sequence and structure analysis. While the scientific performance of NMF is very promising in dealing with high dimensional data sets and complex data structures, its computational cost is high and sometimes could be critical for …


Using Textual Features To Predict Popular Content On Digg, Paul H. Miller May 2011

Using Textual Features To Predict Popular Content On Digg, Paul H. Miller

Paul H Miller

Over the past few years, collaborative rating sites, such as Netflix, Digg and Stumble, have become increasingly prevalent sites for users to find trending content. I used various data mining techniques to study Digg, a social news site, to examine the influence of content on popularity. What influence does content have on popularity, and what influence does content have on users’ decisions? Overwhelmingly, prior studies have consistently shown that predicting popularity based on content is difficult and maybe even inherently impossible. The same submission can have multiple outcomes and content neither determines popularity, nor individual user decisions. My results show …


Using Textual Features To Predict Popular Content On Digg, Paul H. Miller Apr 2011

Using Textual Features To Predict Popular Content On Digg, Paul H. Miller

Department of English: Dissertations, Theses, and Student Research

Over the past few years, collaborative rating sites, such as Netflix, Digg and Stumble, have become increasingly prevalent sites for users to find trending content. I used various data mining techniques to study Digg, a social news site, to examine the influence of content on popularity. What influence does content have on popularity, and what influence does content have on users’ decisions? Overwhelmingly, prior studies have consistently shown that predicting popularity based on content is difficult and maybe even inherently impossible. The same submission can have multiple outcomes and content neither determines popularity, nor individual user decisions. My results show …


The Impact Of Directionality In Predications On Text Mining, Gondy Leroy, Marcelo Fiszman, Thomas C. Rindflesch Jan 2008

The Impact Of Directionality In Predications On Text Mining, Gondy Leroy, Marcelo Fiszman, Thomas C. Rindflesch

CGU Faculty Publications and Research

The number of publications in biomedicine is increasing enormously each year. To help researchers digest the information in these documents, text mining tools are being developed that present co-occurrence relations between concepts. Statistical measures are used to mine interesting subsets of relations. We demonstrate how directionality of these relations affects interestingness. Support and confidence, simple data mining statistics, are used as proxies for interestingness metrics. We first built a test bed of 126,404 directional relations extracted from biomedical abstracts, which we represent as graphs containing a central starting concept and 2 rings of associated relations. We manipulated directionality in four …