Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Data mining

Social and Behavioral Sciences

Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 34

Full-Text Articles in Computer Sciences

Data-Driven Operational And Safety Analysis Of Emerging Shared Electric Scooter Systems, Qingyu Ma Dec 2021

Data-Driven Operational And Safety Analysis Of Emerging Shared Electric Scooter Systems, Qingyu Ma

Computational Modeling & Simulation Engineering Theses & Dissertations

The rapid rise of shared electric scooter (E-Scooter) systems offers many urban areas a new micro-mobility solution. The portable and flexible characteristics have made E-Scooters a competitive mode for short-distance trips. Compared to other modes such as bikes, E-Scooters allow riders to freely ride on different facilities such as streets, sidewalks, and bike lanes. However, sharing lanes with vehicles and other users tends to cause safety issues for riding E-Scooters. Conventional methods are often not applicable for analyzing such safety issues because well-archived historical crash records are not commonly available for emerging E-Scooters.

Perceiving the growth of such a micro-mobility …


Toward Tweet-Mining Framework For Extracting Terrorist Attack-Related Information And Reporting, Farkhund Iqbal, Rabia Batool, Benjamin C. M. Fung, Saiqa Aleem, Ahmed Abbasi, Abdul Rehman Javed Jan 2021

Toward Tweet-Mining Framework For Extracting Terrorist Attack-Related Information And Reporting, Farkhund Iqbal, Rabia Batool, Benjamin C. M. Fung, Saiqa Aleem, Ahmed Abbasi, Abdul Rehman Javed

All Works

The widespread popularity of social networking is leading to the adoption of Twitter as an information dissemination tool. Existing research has shown that information dissemination over Twitter has a much broader reach than traditional media and can be used for effective post-incident measures. People use informal language on Twitter, including acronyms, misspelled words, synonyms, transliteration, and ambiguous terms. This makes incident-related information extraction a non-trivial task. However, this information can be valuable for public safety organizations that need to respond in an emergency. This paper proposes an early event-related information extraction and reporting framework that monitors Twitter streams synthesizes event-specific …


Applied Deep Learning In Intelligent Transportation Systems And Embedding Exploration, Xiaoyuan Liang Aug 2019

Applied Deep Learning In Intelligent Transportation Systems And Embedding Exploration, Xiaoyuan Liang

Dissertations

Deep learning techniques have achieved tremendous success in many real applications in recent years and show their great potential in many areas including transportation. Even though transportation becomes increasingly indispensable in people’s daily life, its related problems, such as traffic congestion and energy waste, have not been completely solved, yet some problems have become even more critical. This dissertation focuses on solving the following fundamental problems: (1) passenger demand prediction, (2) transportation mode detection, (3) traffic light control, in the transportation field using deep learning. The dissertation also extends the application of deep learning to an embedding system for visualization …


Discovery Of Topological Constraints On Spatial Object Classes Using A Refined Topological Model, Ivan Majic, Elham Naghizade, Stephan Winter, Martin Tomko Jun 2019

Discovery Of Topological Constraints On Spatial Object Classes Using A Refined Topological Model, Ivan Majic, Elham Naghizade, Stephan Winter, Martin Tomko

Journal of Spatial Information Science

In a typical data collection process, a surveyed spatial object is annotated upon creation, and is classified based on its attributes. This annotation can also be guided by textual definitions of objects. However, interpretations of such definitions may differ among people, and thus result in subjective and inconsistent classification of objects. This problem becomes even more pronounced if the cultural and linguistic differences are considered. As a solution, this paper investigates the role of topology as the defining characteristic of a class of spatial objects. We propose a data mining approach based on frequent itemset mining to learn patterns in …


Citationally Enhanced Semantic Literature Based Discovery, John David Fleig Jan 2019

Citationally Enhanced Semantic Literature Based Discovery, John David Fleig

CCE Theses and Dissertations

We are living within the age of information. The ever increasing flow of data and publications poses a monumental bottleneck to scientific progress as despite the amazing abilities of the human mind, it is woefully inadequate in processing such a vast quantity of multidimensional information. The small bits of flotsam and jetsam that we leverage belies the amount of useful information beneath the surface. It is imperative that automated tools exist to better search, retrieve, and summarize this content. Combinations of document indexing and search engines can quickly find you a document whose content best matches your query - if …


Data Mining Approach To The Detection Of Suicide In Social Media: A Case Study Of Singapore, Jane H. K. Seah, Kyong Jin Shim Dec 2018

Data Mining Approach To The Detection Of Suicide In Social Media: A Case Study Of Singapore, Jane H. K. Seah, Kyong Jin Shim

Research Collection School Of Computing and Information Systems

In this research, we focus on the social phenomenon of suicide. Specifically, we perform social sensing on digital traces obtained from Reddit. We analyze the posts and comments in that are related to depression and suicide. We perform natural language processing to better understand different aspects of human life that relate to suicide.


Traffic-Cascade: Mining And Visualizing Lifecycles Of Traffic Congestion Events Using Public Bus Trajectories, Agus Trisnajaya Kwee, Meng-Fen Chiang, Philips Kokoh Prasetyo, Ee-Peng Lim Oct 2018

Traffic-Cascade: Mining And Visualizing Lifecycles Of Traffic Congestion Events Using Public Bus Trajectories, Agus Trisnajaya Kwee, Meng-Fen Chiang, Philips Kokoh Prasetyo, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

As road transportation supports both economic and social activities in developed cities, it is important to maintain smooth traffic on all highways and local roads. Whenever possible, traffic congestions should be detected early and resolved quickly. While existing traffic monitoring dashboard systems have been put in place in many cities, these systems require high-cost vehicle speed monitoring instruments and detect traffic congestion as independent events. There is a lack of low-cost dashboards to inspect and analyze the lifecycle of traffic congestion which is critical in assessing the overall impact of congestion, determining the possible the source(s) of congestion and its …


Learning Latent Characteristics Of Locations Using Location-Based Social Networking Data, Thanh Nam Doan May 2018

Learning Latent Characteristics Of Locations Using Location-Based Social Networking Data, Thanh Nam Doan

Dissertations and Theses Collection (Open Access)

This dissertation addresses the modeling of latent characteristics of locations to describe the mobility of users of location-based social networking platforms. With many users signing up location-based social networking platforms to share their daily activities, these platforms become a gold mine for researchers to study human visitation behavior and location characteristics. Modeling such visitation behavior and location characteristics can benefit many use- ful applications such as urban planning and location-aware recommender sys- tems. In this dissertation, we focus on modeling two latent characteristics of locations, namely area attraction and neighborhood competition effects using location-based social network data. Our literature survey …


Clinical Information Extraction From Unstructured Free-Texts, Mingzhe Tao Jan 2018

Clinical Information Extraction From Unstructured Free-Texts, Mingzhe Tao

Legacy Theses & Dissertations (2009 - 2024)

Information extraction (IE) is a fundamental component of natural language processing (NLP) that provides a deeper understanding of the texts. In the clinical domain, documents prepared by medical experts (e.g., discharge summaries, drug labels, medical history records) contain a significant amount of clinically-relevant information that is crucial to the overall well-being of patients. Unfortunately, in many cases, clinically-relevant information is presented in an unstructured format, predominantly consisting of free-texts, making it inaccessible to computerized methods. Automatic extraction of this information can improve accessibility. However, the presence of synonymous expressions, medical acronyms, misspellings, negated phrases, and ambiguous terminologies make automatic extraction …


So What Are You Going To Do With That? The Promises And Pitfalls Of Massive Data Sets, Sigrid Anderson Cordell, Melissa Gomis Jan 2017

So What Are You Going To Do With That? The Promises And Pitfalls Of Massive Data Sets, Sigrid Anderson Cordell, Melissa Gomis

UNL Libraries: Faculty Publications

This article takes as its case study the challenge of data sets for text mining, sources that offer tremendous promise for digital humanities (DH) methodology but present specific challenges for humanities scholars. These text sets raise a range of issues: What skills do you train humanists to have? What is the library’s role in enabling and supporting use of those materials? How do you allocate staff? Who oversees sustainability and data management? By addressing these questions through a specific use case scenario, this article shows how these questions are central to mapping out future directions for a range of library …


Detection Of Cyberbullying In Sms Messaging, Bryan W. Bradley Jul 2016

Detection Of Cyberbullying In Sms Messaging, Bryan W. Bradley

Computer Science Summer Fellows

Cyberbullying is a type of bullying that uses technology such as cell phones to harass or malign another person. To detect acts of cyberbullying, we are developing an algorithm that will detect cyberbullying in SMS (text) messages. Over 80,000 text messages have been collected by software installed on cell phones carried by participants in our study. This paper describes the development of the algorithm to detect cyberbullying messages, using the cell phone data collected previously. The algorithm works by first separating the messages into conversations in an automated way. The algorithm then analyzes the conversations and scores the severity and …


Mining And Clustering Mobility Evolution Patterns From Social Media For Urban Informatics, Chien-Cheng Chen, Meng-Fen Chiang, Wen-Chih Peng May 2016

Mining And Clustering Mobility Evolution Patterns From Social Media For Urban Informatics, Chien-Cheng Chen, Meng-Fen Chiang, Wen-Chih Peng

Research Collection School Of Computing and Information Systems

In this paper, given a set of check-in data, we aim at discovering representative daily movement behavior of users in a city. For example, daily movement behavior on a weekday may show users moving from one to another spatial region associated with time information. Since check-in data contain both spatial and temporal information, we propose a mobility evolution pattern to capture the daily movement behavior of users in a city. Furthermore, given a set of daily mobility evolution patterns, we formulate their similarity distances and then discover representative mobility evolution patterns via the clustering process. Representative mobility evolution patterns are …


Identifying Terrorist Affiliations Through Social Network Analysis Using Data Mining Techniques, Govand A. Ali Apr 2016

Identifying Terrorist Affiliations Through Social Network Analysis Using Data Mining Techniques, Govand A. Ali

Information Technology Master Theses

In a technologically enabled world, local ideologically inspired warfare becomes global all too quickly, specifically terrorist groups like Al Quaeda and ISIS (Daesh) have successfully used modern computing technology and social networking environments to broadcast their message, recruit new members, and plot attacks. This is especially true for such platforms as Twitter and encrypted mobile apps like Telegram or the clandestine Alrawi. As early detection of such activity is crucial to attack prevention data mining techniques have become increasingly important in the fight against the spread of global terrorist activity. This study employs data mining tools to mine Twitter for …


Novel Machine Learning Methods For Modeling Time-To-Event Data, Bhanukiran Vinzamuri Jan 2016

Novel Machine Learning Methods For Modeling Time-To-Event Data, Bhanukiran Vinzamuri

Wayne State University Dissertations

Predicting time-to-event from longitudinal data where different events occur at different time points is an extremely important problem in several domains such as healthcare, economics, social networks and seismology, to name a few. A unique challenge in this problem involves building predictive models from right censored data (also called as survival data). This is a phenomenon where instances whose event of interest are not yet observed within a given observation time window and are considered to be right censored. Effective models for predicting time-to-event labels from such right censored data with good accuracy can have a significant impact in these …


Capstone Projects Mining System For Insights And Recommendations, Melvrivk Aik Chun Goh, Swapna Gottipati, Venky Shankararaman Dec 2015

Capstone Projects Mining System For Insights And Recommendations, Melvrivk Aik Chun Goh, Swapna Gottipati, Venky Shankararaman

Research Collection School Of Computing and Information Systems

In this paper, we present a classification based system to discover knowledge and trends in higher education students’ projects. Essentially, the educational capstone projects provide an opportunity for students to apply what they have learned and prepare themselves for industry needs. Therefore mining such projects gives insights of students’ experiences as well as industry project requirements and trends. In particular, we mine capstone projects executed by Information Systems students to discover patterns and insights related to people, organization, domain, industry needs and time. We build a capstone projects mining system (CPMS) based on classification models that leverage text mining, natural …


Intelligshop: Enabling Intelligent Shopping In Malls Through Location-Based Augmented Reality, Aditi Adhikari, Vincent W. Zheng, Hong Cao, Miao Lin, Yuan Fang, Kevin Chen-Chuan Chang Nov 2015

Intelligshop: Enabling Intelligent Shopping In Malls Through Location-Based Augmented Reality, Aditi Adhikari, Vincent W. Zheng, Hong Cao, Miao Lin, Yuan Fang, Kevin Chen-Chuan Chang

Research Collection School Of Computing and Information Systems

Shopping experience is important for both citizens and tourists. We present IntelligShop, a novel location-based augmented reality application that supports intelligent shopping experience in malls. As the key functionality, IntelligShop provides an augmented reality interface-people can simply use ubiquitous smartphones to face mall retailers, then IntelligShop will automatically recognize the retailers and fetch their online reviews from various sources (including blogs, forums and publicly accessible social media) to display on the phones. Technically, IntelligShop addresses two challenging data mining problems, including robust feature learning to support heterogeneous smartphones in localization and learning to query for automatically gathering the retailer content …


Should We Use The Sample? Analyzing Datasets Sampled From Twitter's Stream Api, Yazhe Wang, Jamie Callan, Baihua Zheng Jun 2015

Should We Use The Sample? Analyzing Datasets Sampled From Twitter's Stream Api, Yazhe Wang, Jamie Callan, Baihua Zheng

Research Collection School Of Computing and Information Systems

Researchers have begun studying content obtained from microblogging services such as Twitter to address a variety of technological, social, and commercial research questions. The large number of Twitter users and even larger volume of tweets often make it impractical to collect and maintain a complete record of activity; therefore, most research and some commercial software applications rely on samples, often relatively small samples, of Twitter data. For the most part, sample sizes have been based on availability and practical considerations. Relatively little attention has been paid to how well these samples represent the underlying stream of Twitter data. To fill …


Author Topic Model-Based Collaborative Filtering For Personalized Poi Recommendations, Shuhui Jiang, Xueming Qian, Jialie Shen, Yun Fu, Tao Mei Jun 2015

Author Topic Model-Based Collaborative Filtering For Personalized Poi Recommendations, Shuhui Jiang, Xueming Qian, Jialie Shen, Yun Fu, Tao Mei

Research Collection School Of Computing and Information Systems

From social media has emerged continuous needs for automatic travel recommendations. Collaborative filtering (CF) is the most well-known approach. However, existing approaches generally suffer from various weaknesses. For example, sparsity can significantly degrade the performance of traditional CF. If a user only visits very few locations, accurate similar user identification becomes very challenging due to lack of sufficient information for effective inference. Moreover, existing recommendation approaches often ignore rich user information like textual descriptions of photos which can reflect users' travel preferences. The topic model (TM) method is an effective way to solve the "sparsity problem," but is still far …


Twitter Location (Sometimes) Matters: Exploring The Relationship Between Georeferenced Tweet Content And Nearby Feature Classes, Stefan Hahmann, Ross S. Purves, Dirk Burghardt Dec 2014

Twitter Location (Sometimes) Matters: Exploring The Relationship Between Georeferenced Tweet Content And Nearby Feature Classes, Stefan Hahmann, Ross S. Purves, Dirk Burghardt

Journal of Spatial Information Science

In this paper, we investigate whether microblogging texts (tweets) produced on mobile devices are related to the geographical locations where they were posted. For this purpose, we correlate tweet topics to areas. In doing so, classified points of interest from OpenStreetMap serve as validation points. We adopted the classification and geolocation of these points to correlate with tweet content by means of manual, supervised, and unsupervised machine learning approaches. Evaluation showed the manual classification approach to be highest quality, followed by the supervised method, and that the unsupervised classification was of low quality. We found that the degree to which …


Time-Series Data Mining In Transportation: A Case Study On Singapore Public Train Commuter Travel Patterns, Roy Ka Wei Lee, Tin Seong Kam Oct 2014

Time-Series Data Mining In Transportation: A Case Study On Singapore Public Train Commuter Travel Patterns, Roy Ka Wei Lee, Tin Seong Kam

Research Collection School Of Computing and Information Systems

The adoption of smart cards technologies and automated data collection systems (ADCS) in transportation domain had provided public transport planners opportunities to amass a huge and continuously increasing amount of time-series data about the behaviors and travel patterns of commuters. However the explosive growth of temporal related databases has far outpaced the transport planners’ ability to interpret these data using conventional statistical techniques, creating an urgent need for new techniques to support the analyst in transforming the data into actionable information and knowledge. This research study thus explores and discusses the potential use of time-series data mining, a relatively new …


Automated Library Recommendation, Ferdian Thung, David Lo, Julia Lawall Jun 2014

Automated Library Recommendation, Ferdian Thung, David Lo, Julia Lawall

David LO

Many third party libraries are available to be downloaded and used. Using such libraries can reduce development time and make the developed software more reliable. However, developers are often unaware of suitable libraries to be used for their projects and thus they miss out on these benefits. To help developers better take advantage of the available libraries, we propose a new technique that automatically recommends libraries to developers. Our technique takes as input the set of libraries that an application currently uses, and recommends other libraries that are likely to be relevant. We follow a hybrid approach that combines association …


On Predicting User Affiliations Using Social Features In Online Social Networks, Minh Thap Nguyen Mar 2014

On Predicting User Affiliations Using Social Features In Online Social Networks, Minh Thap Nguyen

Dissertations and Theses Collection (Open Access)

User profiling such as user affiliation prediction in online social network is a challenging task, with many important applications in targeted marketing and personalized recommendation. The research task here is to predict some user affiliation attributes that suggest user participation in different social groups.


Hot Zone Identification: Analyzing Effects Of Data Sampling On Spam Clustering, Rasib Khan, Mainul Mizan, Ragib Hasan, Alan Sprague Jan 2014

Hot Zone Identification: Analyzing Effects Of Data Sampling On Spam Clustering, Rasib Khan, Mainul Mizan, Ragib Hasan, Alan Sprague

Journal of Digital Forensics, Security and Law

Email is the most common and comparatively the most efficient means of exchanging information in today's world. However, given the widespread use of emails in all sectors, they have been the target of spammers since the beginning. Filtering spam emails has now led to critical actions such as forensic activities based on mining spam email. The data mine for spam emails at the University of Alabama at Birmingham is considered to be one of the most prominent resources for mining and identifying spam sources. It is a widely researched repository used by researchers from different global organizations. The usual process …


Guiding Data-Driven Transportation Decisions, Kristin A. Tufte, Basem Elazzabi, Nathan Hall, Morgan Harvey, Kath Knobe, David Maier, Veronika Margaret Megler Jan 2014

Guiding Data-Driven Transportation Decisions, Kristin A. Tufte, Basem Elazzabi, Nathan Hall, Morgan Harvey, Kath Knobe, David Maier, Veronika Margaret Megler

Computer Science Faculty Publications and Presentations

Urban transportation professionals are under increasing pressure to perform data-driven decision making and to provide data-driven performance metrics. This pressure comes from sources including the federal government and is driven, in part, by the increased volume and variety of transportation data available. This sudden increase of data is partially a result of improved technology for sensors and mobile devices as well as reduced device and storage costs. However, using this proliferation of data for decisions and performance metrics is proving to be difficult. In this paper, we describe a proposed structure for a system to support data-driven decision making. A …


Automated Library Recommendation, Ferdian Thung, David Lo, Julia Lawall Oct 2013

Automated Library Recommendation, Ferdian Thung, David Lo, Julia Lawall

Research Collection School Of Computing and Information Systems

Many third party libraries are available to be downloaded and used. Using such libraries can reduce development time and make the developed software more reliable. However, developers are often unaware of suitable libraries to be used for their projects and thus they miss out on these benefits. To help developers better take advantage of the available libraries, we propose a new technique that automatically recommends libraries to developers. Our technique takes as input the set of libraries that an application currently uses, and recommends other libraries that are likely to be relevant. We follow a hybrid approach that combines association …


Generative Models For Item Adoptions Using Social Correlation, Freddy Chong Tat Chua, Hady Wirawan Lauw, Ee Peng Lim Sep 2013

Generative Models For Item Adoptions Using Social Correlation, Freddy Chong Tat Chua, Hady Wirawan Lauw, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Users face many choices on the Web when it comes to choosing which product to buy, which video to watch, etc. In making adoption decisions, users rely not only on their own preferences, but also on friends. We call the latter social correlation which may be caused by the homophily and social influence effects. In this paper, we focus on modeling social correlation on users’ item adoptions. Given a user-user social graph and an item-user adoption graph, our research seeks to answer the following questions: whether the items adopted by a user correlate to items adopted by her friends, and …


Mining Sensor Datasets With Spatiotemporal Neighborhoods, Michael Patrick Mcguire, Vandana Janeja, Aryya Gangopadhyay Jun 2013

Mining Sensor Datasets With Spatiotemporal Neighborhoods, Michael Patrick Mcguire, Vandana Janeja, Aryya Gangopadhyay

Journal of Spatial Information Science

Many spatiotemporal data mining methods are dependent on how relationships between a spatiotemporal unit and its neighbors are defined. These relationships are often termed the neighborhood of a spatiotemporal object. The focus of this paper is the discovery of spatiotemporal neighborhoods to find automatically spatiotemporal sub-regions in a sensor dataset. This research is motivated by the need to characterize large sensor datasets like those found in oceanographic and meteorological research. The approach presented in this paper finds spatiotemporal neighborhoods in sensor datasets by combining an agglomerative method to create temporal intervals and a graph-based method to find spatial neighborhoods within …


Exploring Place Through User-Generated Content: Using Flickr Tags To Describe City Cores, Livia Hollenstein, Ross Purves Oct 2012

Exploring Place Through User-Generated Content: Using Flickr Tags To Describe City Cores, Livia Hollenstein, Ross Purves

Journal of Spatial Information Science

Terms used to describe city centers, such as Downtown, are key concepts in everyday or vernacular language. Here, we explore such language by harvesting georeferenced and tagged metadata associated with 8 million Flickr images and thus consider how large numbers of people name city core areas. The nature of errors and imprecision in tagging and georeferencing are quantified, and automatically generated precision measures appear to mirror errors in the positioning of images. Users seek to ascribe appropriate semantics to images, though bulk-uploading and bulk-tagging may introduce bias. Between 0.5--2% of tags associated with georeferenced images analyzed describe city core areas …


Empirical Methods For Predicting Student Retention- A Summary From The Literature, Matt Bogard May 2011

Empirical Methods For Predicting Student Retention- A Summary From The Literature, Matt Bogard

Economics Faculty Publications

The vast majority of the literature related to the empirical estimation of retention models includes a discussion of the theoretical retention framework established by Bean, Braxton, Tinto, Pascarella, Terenzini and others (see Bean, 1980; Bean, 2000; Braxton, 2000; Braxton et al, 2004; Chapman and Pascarella, 1983; Pascarell and Ternzini, 1978; St. John and Cabrera, 2000; Tinto, 1975) This body of research provides a starting point for the consideration of which explanatory variables to include in any model specification, as well as identifying possible data sources. The literature separates itself into two major camps including research related to the hypothesis testing …


Enterprise Users And Web Search Behavior, April Ann Lewis May 2010

Enterprise Users And Web Search Behavior, April Ann Lewis

Masters Theses

This thesis describes analysis of user web query behavior associated with Oak Ridge National Laboratory’s (ORNL) Enterprise Search System (Hereafter, ORNL Intranet). The ORNL Intranet provides users a means to search all kinds of data stores for relevant business and research information using a single query. The Global Intranet Trends for 2010 Report suggests the biggest current obstacle for corporate intranets is “findability and Siloed content”. Intranets differ from internets in the way they create, control, and share content which can make it often difficult and sometimes impossible for users to find information. Stenmark (2006) first noted studies of corporate …