Open Access. Powered by Scholars. Published by Universities.®

OS and Networks Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 6 of 6

Full-Text Articles in OS and Networks

Active Learning With Efficient Feature Weighting Methods For Improving Data Quality And Classification Accuracy, Justin Martineau, Lu Chen, Doreen Cheng, Amit P. Sheth Jun 2014

Active Learning With Efficient Feature Weighting Methods For Improving Data Quality And Classification Accuracy, Justin Martineau, Lu Chen, Doreen Cheng, Amit P. Sheth

Kno.e.sis Publications

Many machine learning datasets are noisy with a substantial number of mislabeled instances. This noise yields sub-optimal classification performance. In this paper we study a large, low quality annotated dataset, created quickly and cheaply using Amazon Mechanical Turk to crowdsource annotations. We describe computationally cheap feature weighting techniques and a novel non-linear distribution spreading algorithm that can be used to iteratively and interactively correcting mislabeled instances to significantly improve annotation quality at low cost. Eight different emotion extraction experiments on Twitter data demonstrate that our approach is just as effective as more computationally expensive techniques. Our techniques save a considerable …


Using Statistical Methods To Determine Geolocation Via Twitter, Christopher M. Wright May 2014

Using Statistical Methods To Determine Geolocation Via Twitter, Christopher M. Wright

Masters Theses & Specialist Projects

With the ever expanding usage of social media websites such as Twitter, it is possible to use statistical inquires to form a geographic location of a person using solely the content of their tweets. According to a study done in 2010, Zhiyuan Cheng, was able to detect a location of a Twitter user within 100 miles of their actual location 51% of the time. While this may seem like an already significant find, this study was done while Twitter was still finding its ground to stand on. In 2010, Twitter had 75 million unique users registered, as of March 2013, …


Hierarchical Interest Graph From Tweets, Pavan Kapanipathi, Prateek Jain, Chitra Venkataramani, Amit P. Sheth Apr 2014

Hierarchical Interest Graph From Tweets, Pavan Kapanipathi, Prateek Jain, Chitra Venkataramani, Amit P. Sheth

Kno.e.sis Publications

Industry and researchers have identified numerous ways to monetize microblogs for personalization and recommendation. A common challenge across these different works is the identification of user interests. Although techniques have been developed to address this challenge, a flexible approach that spans multiple levels of granularity in user interests has not been forthcoming. In this work, we focus on exploiting hierarchical semantics of concepts to infer richer user interests expressed as a Hierarchical Interest Graph. To create such graphs, we utilize users' tweets to first ground potential user interests to structured background knowledge such as Wikipedia Category Graph. We then adapt …


Cursing In English On Twitter, Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit P. Sheth Feb 2014

Cursing In English On Twitter, Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit P. Sheth

Kno.e.sis Publications

Cursing is not uncommon during conversations in the physical world: 0.5% to 0.7% of all the words we speak are curse words, given that 1% of all the words are first-person plural pronouns (e.g., we, us, our). On social media, people can instantly chat with friends without face-to-face interaction, usually in a more public fashion and broadly disseminated through highly connected social network. Will these distinctive features of social media lead to a change in people's cursing behavior? In this paper, we examine the characteristics of cursing activity on a popular social media platform - Twitter, involving the analysis of …


Location Prediction Of Twitter Users Using Wikipedia, Revathy Krishnamurthy, Pavan Kapanipathi, Amit P. Sheth, Krishnaprasad Thirunarayan Jan 2014

Location Prediction Of Twitter Users Using Wikipedia, Revathy Krishnamurthy, Pavan Kapanipathi, Amit P. Sheth, Krishnaprasad Thirunarayan

Kno.e.sis Publications

The mining of user generated content in social media has proven very effective in domains ranging from personalization and recommendation systems to crisis management. The knowledge of online users locations makes their tweets more informative and adds another dimension to their analysis. Existing approaches to predict the location of Twitter users are purely data-driven and require large training data sets of geo-tagged tweets. The collection and modelling process of tweets can be time intensive. To overcome this drawback, we propose a novel knowledge based approach that does not require any training data. Our approach uses information in Wikipedia, about cities …


An Exploratory Analysis Of Twitter Keyword-Hashtag Networks And Knowledge Discovery Applications, Ahmed A. Hamed Jan 2014

An Exploratory Analysis Of Twitter Keyword-Hashtag Networks And Knowledge Discovery Applications, Ahmed A. Hamed

Graduate College Dissertations and Theses

The emergence of social media has impacted the way people think, communicate, behave, learn, and conduct research. In recent years, a large number of studies have analyzed and modeled this social phenomena. Driven by commercial and social interests, social media has become an attractive subject for researchers. Accordingly, new models, algorithms, and applications to address specific domains and solve distinct problems have erupted. In this thesis, we propose a novel network model and a path mining algorithm called HashnetMiner to discover implicit knowledge that is not easily exposed using other network models. Our experiments using HashnetMiner have demonstrated anecdotal evidence …