Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 13 of 13

Full-Text Articles in Physical Sciences and Mathematics

Twitter Location (Sometimes) Matters: Exploring The Relationship Between Georeferenced Tweet Content And Nearby Feature Classes, Stefan Hahmann, Ross S. Purves, Dirk Burghardt Dec 2014

Twitter Location (Sometimes) Matters: Exploring The Relationship Between Georeferenced Tweet Content And Nearby Feature Classes, Stefan Hahmann, Ross S. Purves, Dirk Burghardt

Journal of Spatial Information Science

In this paper, we investigate whether microblogging texts (tweets) produced on mobile devices are related to the geographical locations where they were posted. For this purpose, we correlate tweet topics to areas. In doing so, classified points of interest from OpenStreetMap serve as validation points. We adopted the classification and geolocation of these points to correlate with tweet content by means of manual, supervised, and unsupervised machine learning approaches. Evaluation showed the manual classification approach to be highest quality, followed by the supervised method, and that the unsupervised classification was of low quality. We found that the degree to which …


Issues Of Social Data Analytics With A New Method For Sentiment Analysis Of Social Media Data, Zhaoxia Wang, Victor J. C. Tong, David Chan Dec 2014

Issues Of Social Data Analytics With A New Method For Sentiment Analysis Of Social Media Data, Zhaoxia Wang, Victor J. C. Tong, David Chan

Research Collection School of Social Sciences

Social media data consists of feedback, critiques and other comments that are posted online by internet users. Collectively, these comments may reflect sentiments that are sometimes not captured in traditional data collection methods such as administering a survey questionnaire. Thus, social media data offers a rich source of information, which can be adequately analyzed and understood. In this paper, we survey the extant research literature on sentiment analysis and discuss various limitations of the existing analytical methods. A major limitation in the large majority of existing research is the exclusive focus on social media data in the English language. There …


Anomaly Detection Through Enhanced Sentiment Analysis On Social Media Data, Zhaoxia Wang, Victor Joo, Chuan Tong, Xin Xin, Hoong Chor Chin Dec 2014

Anomaly Detection Through Enhanced Sentiment Analysis On Social Media Data, Zhaoxia Wang, Victor Joo, Chuan Tong, Xin Xin, Hoong Chor Chin

Research Collection School Of Computing and Information Systems

Anomaly detection in sentiment analysis refers to detecting abnormal opinions, sentiment patterns or special temporal aspects of such patterns in a collection of data. The anomalies detected may be due to sudden sentiment changes hidden in large amounts of text. If these anomalies are undetected or poorly managed, the consequences may be severe, e.g. A business whose customers reveal negative sentiments and will no longer support the establishment. Social media platforms, such as Twitter, provide a vast source of information, which includes user feedback, opinion and information on most issues. Many organizations also leverage social media platforms to publish information …


Data Preparation For Social Network Mining And Analysis, Yazhe Wang Dec 2014

Data Preparation For Social Network Mining And Analysis, Yazhe Wang

Dissertations and Theses Collection (Open Access)

This dissertation studies the problem of preparing good-quality social network data for data analysis and mining. Modern online social networks such as Twitter, Facebook, and LinkedIn have rapidly grown in popularity. The consequent availability of a wealth of social network data provides an unprecedented opportunity for data analysis and mining researchers to determine useful and actionable information in a wide variety of fields such as social sciences, marketing, management, and security. However, raw social network data are vast, noisy, distributed, and sensitive in nature, which challenge data mining and analysis tasks in storage, efficiency, accuracy, etc. Many mining algorithms cannot …


Sharing Political News: The Balancing Act Of Intimacy And Socialization In Selective Exposure, Jisun An, Daniele Quercia, Meeyoung Cha, Krishna Gummadi, Jon Crowcroft Sep 2014

Sharing Political News: The Balancing Act Of Intimacy And Socialization In Selective Exposure, Jisun An, Daniele Quercia, Meeyoung Cha, Krishna Gummadi, Jon Crowcroft

Research Collection School Of Computing and Information Systems

One might think that, compared to traditional media, social media sites allow people to choose more freely what to read and what to share, especially for politically oriented news. However, reading and sharing habits originate from deeply ingrained behaviors that might be hard to change. To test the extent to which this is true, we propose a Political News Sharing (PoNS) model that holistically captures four key aspects of social psychology: gratification, selective exposure, socialization, and trust & intimacy. Using real instances of political news sharing in Twitter, we study the predictive power of these features. As one might expect, …


On Macro And Micro Exploration Of Hashtag Diffusion In Twitter, Yazhe Wang, Baihua Zheng Aug 2014

On Macro And Micro Exploration Of Hashtag Diffusion In Twitter, Yazhe Wang, Baihua Zheng

Research Collection School Of Computing and Information Systems

This exploratory work studies hashtag diffusion in Twitter. The analysis is conducted from two aspects. From the macro perspective, we study general properties of hashtag diffusion, and classify hashtags into three main classes based on their temporal dynamics referred as 'single spike', 'multi-spikes', and 'fluctuation', and find that each of these classes has some unique characteristics. From the micro perspective, we investigate individual diffusion.We adopt Edelman's 'topology of influence' theory to identify four type of users with different influence levels in diffusion based on their dynamic retweet behaviors. The results of our study are useful for gaining more insights of …


Active Learning With Efficient Feature Weighting Methods For Improving Data Quality And Classification Accuracy, Justin Martineau, Lu Chen, Doreen Cheng, Amit P. Sheth Jun 2014

Active Learning With Efficient Feature Weighting Methods For Improving Data Quality And Classification Accuracy, Justin Martineau, Lu Chen, Doreen Cheng, Amit P. Sheth

Kno.e.sis Publications

Many machine learning datasets are noisy with a substantial number of mislabeled instances. This noise yields sub-optimal classification performance. In this paper we study a large, low quality annotated dataset, created quickly and cheaply using Amazon Mechanical Turk to crowdsource annotations. We describe computationally cheap feature weighting techniques and a novel non-linear distribution spreading algorithm that can be used to iteratively and interactively correcting mislabeled instances to significantly improve annotation quality at low cost. Eight different emotion extraction experiments on Twitter data demonstrate that our approach is just as effective as more computationally expensive techniques. Our techniques save a considerable …


Using Statistical Methods To Determine Geolocation Via Twitter, Christopher M. Wright May 2014

Using Statistical Methods To Determine Geolocation Via Twitter, Christopher M. Wright

Masters Theses & Specialist Projects

With the ever expanding usage of social media websites such as Twitter, it is possible to use statistical inquires to form a geographic location of a person using solely the content of their tweets. According to a study done in 2010, Zhiyuan Cheng, was able to detect a location of a Twitter user within 100 miles of their actual location 51% of the time. While this may seem like an already significant find, this study was done while Twitter was still finding its ground to stand on. In 2010, Twitter had 75 million unique users registered, as of March 2013, …


Hierarchical Interest Graph From Tweets, Pavan Kapanipathi, Prateek Jain, Chitra Venkataramani, Amit P. Sheth Apr 2014

Hierarchical Interest Graph From Tweets, Pavan Kapanipathi, Prateek Jain, Chitra Venkataramani, Amit P. Sheth

Kno.e.sis Publications

Industry and researchers have identified numerous ways to monetize microblogs for personalization and recommendation. A common challenge across these different works is the identification of user interests. Although techniques have been developed to address this challenge, a flexible approach that spans multiple levels of granularity in user interests has not been forthcoming. In this work, we focus on exploiting hierarchical semantics of concepts to infer richer user interests expressed as a Hierarchical Interest Graph. To create such graphs, we utilize users' tweets to first ground potential user interests to structured background knowledge such as Wikipedia Category Graph. We then adapt …


Cursing In English On Twitter, Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit P. Sheth Feb 2014

Cursing In English On Twitter, Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit P. Sheth

Kno.e.sis Publications

Cursing is not uncommon during conversations in the physical world: 0.5% to 0.7% of all the words we speak are curse words, given that 1% of all the words are first-person plural pronouns (e.g., we, us, our). On social media, people can instantly chat with friends without face-to-face interaction, usually in a more public fashion and broadly disseminated through highly connected social network. Will these distinctive features of social media lead to a change in people's cursing behavior? In this paper, we examine the characteristics of cursing activity on a popular social media platform - Twitter, involving the analysis of …


Location Prediction Of Twitter Users Using Wikipedia, Revathy Krishnamurthy, Pavan Kapanipathi, Amit P. Sheth, Krishnaprasad Thirunarayan Jan 2014

Location Prediction Of Twitter Users Using Wikipedia, Revathy Krishnamurthy, Pavan Kapanipathi, Amit P. Sheth, Krishnaprasad Thirunarayan

Kno.e.sis Publications

The mining of user generated content in social media has proven very effective in domains ranging from personalization and recommendation systems to crisis management. The knowledge of online users locations makes their tweets more informative and adds another dimension to their analysis. Existing approaches to predict the location of Twitter users are purely data-driven and require large training data sets of geo-tagged tweets. The collection and modelling process of tweets can be time intensive. To overcome this drawback, we propose a novel knowledge based approach that does not require any training data. Our approach uses information in Wikipedia, about cities …


An Exploratory Analysis Of Twitter Keyword-Hashtag Networks And Knowledge Discovery Applications, Ahmed A. Hamed Jan 2014

An Exploratory Analysis Of Twitter Keyword-Hashtag Networks And Knowledge Discovery Applications, Ahmed A. Hamed

Graduate College Dissertations and Theses

The emergence of social media has impacted the way people think, communicate, behave, learn, and conduct research. In recent years, a large number of studies have analyzed and modeled this social phenomena. Driven by commercial and social interests, social media has become an attractive subject for researchers. Accordingly, new models, algorithms, and applications to address specific domains and solve distinct problems have erupted. In this thesis, we propose a novel network model and a path mining algorithm called HashnetMiner to discover implicit knowledge that is not easily exposed using other network models. Our experiments using HashnetMiner have demonstrated anecdotal evidence …


Birds Of A Feather Deceive Together: The Chicanery Of Multiplied Metadata, David M. Cook Dec 2013

Birds Of A Feather Deceive Together: The Chicanery Of Multiplied Metadata, David M. Cook

Dr. David M Cook

New Media conventions have fluttered along unforeseen flight paths. By combining sock-puppetry with the grouping power of metadata it is possible to demonstrate widespread influence through Twitter dispersion. In one nest there is a growing use of sock-puppetry accentuated by the exploitation of a social media that does not attempt to verify proof of identity. Created identities in their thousands can flock towards, and in support of, a single identity. They do so alongside legitimate accounts but in concert remain imperceptible within an overall group. In another nest there is the practise of homophily, captured through metadata, and used to …