Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 10 of 10

Full-Text Articles in Databases and Information Systems

Distinctive Features Of Nonverbal Behavior And Mimicry In Application Interviews Through Data Analysis And Machine Learning, Sanne Rogiers, Elias Corneillie, Filip Lievens, Frederik Anseel, Peter Veelaert, Wilfried Philips Sep 2022

Distinctive Features Of Nonverbal Behavior And Mimicry In Application Interviews Through Data Analysis And Machine Learning, Sanne Rogiers, Elias Corneillie, Filip Lievens, Frederik Anseel, Peter Veelaert, Wilfried Philips

Research Collection Lee Kong Chian School Of Business

This paper reveals the characteristics and effects of nonverbal behavior and human mimicry in the context of application interviews. It discloses a novel analyzation method for psychological research by utilizing machine learning. In comparison to traditional manual data analysis, machine learning proves to be able to analyze the data more deeply and to discover connections in the data invisible to the human eye. The paper describes an experiment to measure and analyze the reactions of evaluators to job applicants who adopt specific behaviors: mimicry, suppress, immediacy and natural behavior. First, evaluation of the applicant qualifications by the interviewer reveals …


The State Of The Art Of Information Integration In Space Applications, Zhuming Bi, K. L. Yung, Andrew W.H. Ip., Yuk Ming Tang, Chris W.J. Zhang, Li Da Xu Jan 2022

The State Of The Art Of Information Integration In Space Applications, Zhuming Bi, K. L. Yung, Andrew W.H. Ip., Yuk Ming Tang, Chris W.J. Zhang, Li Da Xu

Information Technology & Decision Sciences Faculty Publications

This paper aims to present a comprehensive survey on information integration (II) in space informatics. With an ever-increasing scale and dynamics of complex space systems, II has become essential in dealing with the complexity, changes, dynamics, and uncertainties of space systems. The applications of space II (SII) require addressing some distinctive functional requirements (FRs) of heterogeneity, networking, communication, security, latency, and resilience; while limited works are available to examine recent advances of SII thoroughly. This survey helps to gain the understanding of the state of the art of SII in sense that (1) technical drivers for SII are discussed and …


Dismastd: An Efficient Distributed Multi-Aspect Streaming Tensor Decomposition, Keyu Yang, Yunjun Gao, Yifeng Shen, Baihua Zheng, Lu Chen Apr 2021

Dismastd: An Efficient Distributed Multi-Aspect Streaming Tensor Decomposition, Keyu Yang, Yunjun Gao, Yifeng Shen, Baihua Zheng, Lu Chen

Research Collection School Of Computing and Information Systems

Tensor decomposition is a fundamental multidimensional data analysis tool for many data-driven applications, such as social computing, computer vision, and bioinformatics, to name but a few. However, the rapidly increasing streaming data nowadays introduces new challenges to traditional static tensor decomposition. It requires an efficient distributed dynamic tensor decomposition without re-computing the whole tensor from scratch. In this paper, we propose DisMASTD, an efficient distributed multi-aspect streaming tensor decomposition. First, we prove the optimal tensor partitioning problem is NP-hard. Second, we present two heuristic tensor partitioning approaches to ensure the load balancing. Third, we develop a distributed multi-aspect streaming tensor …


Assessing Topical Homogeneity With Word Embedding And Distance Matrices, Jeffrey M. Stanton, Yisi Sang Oct 2020

Assessing Topical Homogeneity With Word Embedding And Distance Matrices, Jeffrey M. Stanton, Yisi Sang

School of Information Studies - Faculty Scholarship

Researchers from many fields have used statistical tools to make sense of large bodies of text. Many tools support quantitative analysis of documents within a corpus, but relatively few studies have examined statistical characteristics of whole corpora. Statistical summaries of whole corpora and comparisons between corpora have potential application in the analysis of topically organized applications such social media platforms. In this study, we created matrix representations of several corpora and examined several statistical tests to make comparisons between pairs of corpora with respect to the topical homogeneity of documents within each corpus. Results of three experiments suggested that a …


Identifying Regional Trends In Avatar Customization, Peter Mawhorter, Sercan Sengun, Haewoon Kwak, D. Fox Harrell Dec 2019

Identifying Regional Trends In Avatar Customization, Peter Mawhorter, Sercan Sengun, Haewoon Kwak, D. Fox Harrell

Research Collection School Of Computing and Information Systems

Since virtual identities such as social media profiles and avatars have become a common venue for self-expression, it has become important to consider the ways in which existing systems embed the values of their designers. In order to design virtual identity systems that reflect the needs and preferences of diverse users, understanding how the virtual identity construction differs between groups is important. This paper presents a new methodology that leverages deep learning and differential clustering for comparative analysis of profile images, with a case study of almost 100 000 avatars from a large online community using a popular avatar creation …


Smartphone Sensing Meets Transport Data: A Collaborative Framework For Transportation Service Analytics, Yu Lu, Archan Misra, Wen Sun, Huayu Wu Aug 2017

Smartphone Sensing Meets Transport Data: A Collaborative Framework For Transportation Service Analytics, Yu Lu, Archan Misra, Wen Sun, Huayu Wu

Research Collection School Of Computing and Information Systems

We advocate for and introduce TRANSense, a framework for urban transportation service analytics that combines participatory smartphone sensing data with city-scale transportation-related transactional data (taxis, trains etc.). Our work is driven by the observed limitations of using each data type in isolation: (a) commonly-used anonymous city-scale datasets (such as taxi bookings and GPS trajectories) provide insights into the aggregate behavior of transport infrastructure, but fail to reveal individual-specific transport experiences (e.g., wait times in taxi queues); while (b) mobile sensing data can capture individual-specific commuting-related activities, but suffers from accuracy and energy overhead challenges due to usage artefacts and lack …


Harnessing The Power Of Text Mining For The Detection Of Abusive Content In Social Media, Hao Chen, Susan Mckeever, Sarah Jane Delany Jan 2016

Harnessing The Power Of Text Mining For The Detection Of Abusive Content In Social Media, Hao Chen, Susan Mckeever, Sarah Jane Delany

Conference papers

Abstract The issues of cyberbullying and online harassment have gained considerable coverage in the last number of years. Social media providers need to be able to detect abusive content both accurately and efficiently in order to protect their users. Our aim is to investigate the application of core text mining techniques for the automatic detection of abusive content across a range of social media sources include blogs, forums, media-sharing, Q&A and chat - using datasets from Twitter, YouTube, MySpace, Kongregate, Formspring and Slashdot. Using supervised machine learning, we compare alternative text representations and dimension reduction approaches, including feature selection and …


An Approach To Nearest Neighboring Search For Multi-Dimensional Data, Yong Shi, Li Zhang, Lei Zhu Mar 2011

An Approach To Nearest Neighboring Search For Multi-Dimensional Data, Yong Shi, Li Zhang, Lei Zhu

Faculty and Research Publications

Finding nearest neighbors in large multi-dimensional data has always been one of the research interests in data mining field. In this paper, we present our continuous research on similarity search problems. Previously we have worked on exploring the meaning of K nearest neighbors from a new perspective in PanKNN [20]. It redefines the distances between data points and a given query point Q, efficiently and effectively selecting data points which are closest to Q. It can be applied in various data mining fields. A large amount of real data sets have irrelevant or obstacle information which greatly affects the effectiveness …


Sgpm: Static Group Pattern Mining Using Apriori-Like Sliding Window, John Goh, David Taniar, Ee Peng Lim Apr 2006

Sgpm: Static Group Pattern Mining Using Apriori-Like Sliding Window, John Goh, David Taniar, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Mobile user data mining is a field that focuses on extracting interesting pattern and knowledge out from data generated by mobile users. Group pattern is a type of mobile user data mining method. In group pattern mining, group patterns from a given user movement database is found based on spatio-temporal distances. In this paper, we propose an improvement of efficiency using area method for locating mobile users and using sliding window for static group pattern mining. This reduces the complexity of valid group pattern mining problem. We support the use of static method, which uses areas and sliding windows instead …


Fisa: Feature-Based Instance Selection For Imbalanced Text Classification, Aixin Sun, Ee Peng Lim, Boualem Benatallah, Mahbub Hassan Apr 2006

Fisa: Feature-Based Instance Selection For Imbalanced Text Classification, Aixin Sun, Ee Peng Lim, Boualem Benatallah, Mahbub Hassan

Research Collection School Of Computing and Information Systems

Support Vector Machines (SVM) classifiers are widely used in text classification tasks and these tasks often involve imbalanced training. In this paper, we specifically address the cases where negative training documents significantly outnumber the positive ones. A generic algorithm known as FISA (Feature-based Instance Selection Algorithm), is proposed to select only a subset of negative training documents for training a SVM classifier. With a smaller carefully selected training set, a SVM classifier can be more efficiently trained while delivering comparable or better classification accuracy. In our experiments on the 20-Newsgroups dataset, using only 35% negative training examples and 60% learning …