Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 7 of 7

Full-Text Articles in Databases and Information Systems

Investigating Toxicity Changes Of Cross-Community Redditors From 2 Billion Posts And Comments, Hind Almerekhi, Haewoon Kwak, Bernard J. Jansen Aug 2022

Investigating Toxicity Changes Of Cross-Community Redditors From 2 Billion Posts And Comments, Hind Almerekhi, Haewoon Kwak, Bernard J. Jansen

Research Collection School Of Computing and Information Systems

This research investigates changes in online behavior of users who publish in multiple communities on Reddit by measuring their toxicity at two levels. With the aid of crowdsourcing, we built a labeled dataset of 10,083 Reddit comments, then used the dataset to train and fine-tune a Bidirectional Encoder Representations from Transformers (BERT) neural network model. The model predicted the toxicity levels of 87,376,912 posts from 577,835 users and 2,205,581,786 comments from 890,913 users on Reddit over 16 years, from 2005 to 2020. This study utilized the toxicity levels of user content to identify toxicity changes by the user within the …


Anatomy Of Online Hate: Developing A Taxonomy And Machine Learning Models For Identifying And Classifying Hate In Online News Media, Joni Salminen, Hind Almerekhi, Milica Milenkovic, Soon-Gyu Jung, Haewoon Kwak, Haewoon Kwak, Bernard J. Jansen Jan 2018

Anatomy Of Online Hate: Developing A Taxonomy And Machine Learning Models For Identifying And Classifying Hate In Online News Media, Joni Salminen, Hind Almerekhi, Milica Milenkovic, Soon-Gyu Jung, Haewoon Kwak, Haewoon Kwak, Bernard J. Jansen

Research Collection School Of Computing and Information Systems

Online social media platforms generally attempt to mitigate hateful expressions, as these comments can be detrimental to the health of the community. However, automatically identifying hateful comments can be challenging. We manually label 5,143 hateful expressions posted to YouTube and Facebook videos among a dataset of 137,098 comments from an online news media. We then create a granular taxonomy of different types and targets of online hate and train machine learning models to automatically detect and classify the hateful comments in the full dataset. Our contribution is twofold: 1) creating a granular taxonomy for hateful online comments that includes both …


A Novel Approach For Classifying Gene Expression Data Using Topic Modeling, Soon Jye Kho, Himi Yalamanchili, Michael L. Raymer, Amit Sheth Jan 2017

A Novel Approach For Classifying Gene Expression Data Using Topic Modeling, Soon Jye Kho, Himi Yalamanchili, Michael L. Raymer, Amit Sheth

Kno.e.sis Publications

Understanding the role of differential gene expression in cancer etiology and cellular process is a complex problem that continues to pose a challenge due to sheer number of genes and inter-related biological processes involved. In this paper, we employ an unsupervised topic model, Latent Dirichlet Allocation (LDA) to mitigate overfitting of high-dimensionality gene expression data and to facilitate understanding of the associated pathways. LDA has been recently applied for clustering and exploring genomic data but not for classification and prediction. Here, we proposed to use LDA inclustering as well as in classification of cancer and healthy tissues using lung cancer …


Spiteful, One-Off, And Kind: Predicting Customer Feedback Behavior On Twitter, Agus Sulistya, Abhishek Sharma, David Lo Nov 2016

Spiteful, One-Off, And Kind: Predicting Customer Feedback Behavior On Twitter, Agus Sulistya, Abhishek Sharma, David Lo

Research Collection School Of Computing and Information Systems

Social media provides a convenient way for customers to express their feedback to companies. Identifying different types of customers based on their feedback behavior can help companies to maintain their customers. In this paper, we use a machine learning approach to predict a customer’s feedback behavior based on her first feedback tweet. First, we identify a few categories of customers based on their feedback frequency and the sentiment of the feedback. We identify three main categories: spiteful, one-off, and kind. Next, we build a model to predict the category of a customer given her first feedback. We use profile and …


A Comparison Of Fundamental Network Formation Principles Between Offline And Online Friends On Twitter, Felicia Natali, Feida Zhu Jan 2016

A Comparison Of Fundamental Network Formation Principles Between Offline And Online Friends On Twitter, Felicia Natali, Feida Zhu

Research Collection School Of Computing and Information Systems

We investigate the differences between how some of the fundamental principles of network formation apply among offline friends and how they apply among online friends on Twitter. We consider three fundamental principles of network formation proposed by Schaefer et al.: reciprocity, popularity, and triadic closure. Overall, we discover that these principles mainly apply to offline friends on Twitter. Based on how these principles apply to offline versus online friends, we formulate rules to predict offline friendship on Twitter. We compare our algorithm with popular machine learning algorithms and Xiewei’s random walk algorithm. Our algorithm beats the machine learning algorithms on …


Use Of A High-Value Social Audience Index For Target Audience Identification On Twitter, Siaw Ling Lo, David Cornforth, Raymond. Chiong Feb 2015

Use Of A High-Value Social Audience Index For Target Audience Identification On Twitter, Siaw Ling Lo, David Cornforth, Raymond. Chiong

Research Collection School Of Computing and Information Systems

With the large and growing user base of social media, it is not an easy feat to identify potential customers for business. This is mainly due to the challenge of extracting commercially viable contents from the vast amount of free-form conversations. In this paper, we analyse the Twitter content of an account owner and its list of followers through various text mining methods and segment the list of followers via an index. We have termed this index as the High-Value Social Audience (HVSA) index. This HVSA index enables a company or organisation to devise their marketing and engagement plan according …


On Predicting User Affiliations Using Social Features In Online Social Networks, Minh Thap Nguyen Mar 2014

On Predicting User Affiliations Using Social Features In Online Social Networks, Minh Thap Nguyen

Dissertations and Theses Collection (Open Access)

User profiling such as user affiliation prediction in online social network is a challenging task, with many important applications in targeted marketing and personalized recommendation. The research task here is to predict some user affiliation attributes that suggest user participation in different social groups.