Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Social and Behavioral Sciences

PDF

Data mining

Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 46

Full-Text Articles in Physical Sciences and Mathematics

Quantification Of Landside Congestion In Ports: An Analysis Based On Gps Data, Kumushini Thennakoon, Namal Bandaranayake, Senevi Kiridena, Asela K. Kulatunga Jan 2024

Quantification Of Landside Congestion In Ports: An Analysis Based On Gps Data, Kumushini Thennakoon, Namal Bandaranayake, Senevi Kiridena, Asela K. Kulatunga

Computer Science Faculty Publications

Hinterland transport is a critical segment in maritime cross-border logistics, which links the end-users of global supply chains to the maritime segment. Truck-based hinterland transport is known to cause congestion in and around ports. This study aimed to quantify the congestion caused by trucks at the Port of Colombo, which has not been a subject of a systematic study. To this end, the study makes use of GPS data. In addition to revealing heavy congestion within the port, the study also reveals significant variations in congestion during different times of the day with the duration of journeys peaking from 1200hrs …


Analyzing The Production And Use Of Fossil Fuels: A Case For Data Mining And Gis, Alejandro Conde Oct 2022

Analyzing The Production And Use Of Fossil Fuels: A Case For Data Mining And Gis, Alejandro Conde

Geography and the Environment: Graduate Student Capstones

As technology progresses and data grows both larger and more complex, techniques are being developed to keep up with the exponential growth of information. The term “data mining” is a blanket term used to describe an approach to find anomalies and correlations in a large dataset. This approach involves leveraging data mining software to manipulate and prepare data, apply statistics to quantify trends and characteristics in the data from a high level, and potentially apply advanced techniques like machine learning to identify patterns that wouldn’t be apparent otherwise. In this case study, data mining aided a GIS in displaying substantial …


A Remote Sensing And Machine Learning-Based Approach To Forecast The Onset Of Harmful Algal Bloom (Red Tides), Moein Izadi Apr 2022

A Remote Sensing And Machine Learning-Based Approach To Forecast The Onset Of Harmful Algal Bloom (Red Tides), Moein Izadi

Dissertations

In the last few decades, harmful algal blooms (HABs, also known as “red tides”) have become one of the most detrimental natural phenomena all around the world especially in Florida’s coastal areas due to local environmental factors and global warming in a larger scale. Karenia brevis produces toxins that have harmful effects on humans, fisheries, and ecosystems. In this study, I developed and compared the efficiency of state-of-the-art machine learning models (e.g., XGBoost, Random Forest, and Support Vector Machine) in predicting the occurrence of HABs. In the proposed models, the K. brevis abundance is used as the target, and 10 …


Data-Driven Operational And Safety Analysis Of Emerging Shared Electric Scooter Systems, Qingyu Ma Dec 2021

Data-Driven Operational And Safety Analysis Of Emerging Shared Electric Scooter Systems, Qingyu Ma

Computational Modeling & Simulation Engineering Theses & Dissertations

The rapid rise of shared electric scooter (E-Scooter) systems offers many urban areas a new micro-mobility solution. The portable and flexible characteristics have made E-Scooters a competitive mode for short-distance trips. Compared to other modes such as bikes, E-Scooters allow riders to freely ride on different facilities such as streets, sidewalks, and bike lanes. However, sharing lanes with vehicles and other users tends to cause safety issues for riding E-Scooters. Conventional methods are often not applicable for analyzing such safety issues because well-archived historical crash records are not commonly available for emerging E-Scooters.

Perceiving the growth of such a micro-mobility …


Big Data: Ethics, Resources, And Potential Collaboration, Matthew Zook Feb 2021

Big Data: Ethics, Resources, And Potential Collaboration, Matthew Zook

Geography Presentations

This presentation goes over 10 simple rules for responsible big data research.


Toward Tweet-Mining Framework For Extracting Terrorist Attack-Related Information And Reporting, Farkhund Iqbal, Rabia Batool, Benjamin C. M. Fung, Saiqa Aleem, Ahmed Abbasi, Abdul Rehman Javed Jan 2021

Toward Tweet-Mining Framework For Extracting Terrorist Attack-Related Information And Reporting, Farkhund Iqbal, Rabia Batool, Benjamin C. M. Fung, Saiqa Aleem, Ahmed Abbasi, Abdul Rehman Javed

All Works

The widespread popularity of social networking is leading to the adoption of Twitter as an information dissemination tool. Existing research has shown that information dissemination over Twitter has a much broader reach than traditional media and can be used for effective post-incident measures. People use informal language on Twitter, including acronyms, misspelled words, synonyms, transliteration, and ambiguous terms. This makes incident-related information extraction a non-trivial task. However, this information can be valuable for public safety organizations that need to respond in an emergency. This paper proposes an early event-related information extraction and reporting framework that monitors Twitter streams synthesizes event-specific …


Applied Deep Learning In Intelligent Transportation Systems And Embedding Exploration, Xiaoyuan Liang Aug 2019

Applied Deep Learning In Intelligent Transportation Systems And Embedding Exploration, Xiaoyuan Liang

Dissertations

Deep learning techniques have achieved tremendous success in many real applications in recent years and show their great potential in many areas including transportation. Even though transportation becomes increasingly indispensable in people’s daily life, its related problems, such as traffic congestion and energy waste, have not been completely solved, yet some problems have become even more critical. This dissertation focuses on solving the following fundamental problems: (1) passenger demand prediction, (2) transportation mode detection, (3) traffic light control, in the transportation field using deep learning. The dissertation also extends the application of deep learning to an embedding system for visualization …


Discovery Of Topological Constraints On Spatial Object Classes Using A Refined Topological Model, Ivan Majic, Elham Naghizade, Stephan Winter, Martin Tomko Jun 2019

Discovery Of Topological Constraints On Spatial Object Classes Using A Refined Topological Model, Ivan Majic, Elham Naghizade, Stephan Winter, Martin Tomko

Journal of Spatial Information Science

In a typical data collection process, a surveyed spatial object is annotated upon creation, and is classified based on its attributes. This annotation can also be guided by textual definitions of objects. However, interpretations of such definitions may differ among people, and thus result in subjective and inconsistent classification of objects. This problem becomes even more pronounced if the cultural and linguistic differences are considered. As a solution, this paper investigates the role of topology as the defining characteristic of a class of spatial objects. We propose a data mining approach based on frequent itemset mining to learn patterns in …


The Paradox Of Big Data, Gary N. Smith Jan 2019

The Paradox Of Big Data, Gary N. Smith

Pomona Economics

Data-mining is often used to discover patterns in Big Data. It is tempting believe that because an unearthed pattern is unusual it must be meaningful, but patterns are inevitable in Big Data and usually meaningless. The paradox of Big Data is that data mining is most seductive when there are a large number of variables, but a large number of variables exacerbates the perils of data mining.


Citationally Enhanced Semantic Literature Based Discovery, John David Fleig Jan 2019

Citationally Enhanced Semantic Literature Based Discovery, John David Fleig

CCE Theses and Dissertations

We are living within the age of information. The ever increasing flow of data and publications poses a monumental bottleneck to scientific progress as despite the amazing abilities of the human mind, it is woefully inadequate in processing such a vast quantity of multidimensional information. The small bits of flotsam and jetsam that we leverage belies the amount of useful information beneath the surface. It is imperative that automated tools exist to better search, retrieve, and summarize this content. Combinations of document indexing and search engines can quickly find you a document whose content best matches your query - if …


Data Mining Approach To The Detection Of Suicide In Social Media: A Case Study Of Singapore, Jane H. K. Seah, Kyong Jin Shim Dec 2018

Data Mining Approach To The Detection Of Suicide In Social Media: A Case Study Of Singapore, Jane H. K. Seah, Kyong Jin Shim

Research Collection School Of Computing and Information Systems

In this research, we focus on the social phenomenon of suicide. Specifically, we perform social sensing on digital traces obtained from Reddit. We analyze the posts and comments in that are related to depression and suicide. We perform natural language processing to better understand different aspects of human life that relate to suicide.


Traffic-Cascade: Mining And Visualizing Lifecycles Of Traffic Congestion Events Using Public Bus Trajectories, Agus Trisnajaya Kwee, Meng-Fen Chiang, Philips Kokoh Prasetyo, Ee-Peng Lim Oct 2018

Traffic-Cascade: Mining And Visualizing Lifecycles Of Traffic Congestion Events Using Public Bus Trajectories, Agus Trisnajaya Kwee, Meng-Fen Chiang, Philips Kokoh Prasetyo, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

As road transportation supports both economic and social activities in developed cities, it is important to maintain smooth traffic on all highways and local roads. Whenever possible, traffic congestions should be detected early and resolved quickly. While existing traffic monitoring dashboard systems have been put in place in many cities, these systems require high-cost vehicle speed monitoring instruments and detect traffic congestion as independent events. There is a lack of low-cost dashboards to inspect and analyze the lifecycle of traffic congestion which is critical in assessing the overall impact of congestion, determining the possible the source(s) of congestion and its …


Learning Latent Characteristics Of Locations Using Location-Based Social Networking Data, Thanh Nam Doan May 2018

Learning Latent Characteristics Of Locations Using Location-Based Social Networking Data, Thanh Nam Doan

Dissertations and Theses Collection (Open Access)

This dissertation addresses the modeling of latent characteristics of locations to describe the mobility of users of location-based social networking platforms. With many users signing up location-based social networking platforms to share their daily activities, these platforms become a gold mine for researchers to study human visitation behavior and location characteristics. Modeling such visitation behavior and location characteristics can benefit many use- ful applications such as urban planning and location-aware recommender sys- tems. In this dissertation, we focus on modeling two latent characteristics of locations, namely area attraction and neighborhood competition effects using location-based social network data. Our literature survey …


Understanding The Novice Decision-Making Process In Forensic Footwear Examinations: Accuracy And Decision Rules, Madonna A. Nobel Jan 2018

Understanding The Novice Decision-Making Process In Forensic Footwear Examinations: Accuracy And Decision Rules, Madonna A. Nobel

Graduate Theses, Dissertations, and Problem Reports

The reproducibility of experienced-based forensic pattern interpretation is founded on the notion that domain-specific knowledge can be successfully distributed and applied among experts within a group. This assumption persists, even when the examination is complicated by variations in case circumstances, such as impression clarity and totality, as well as media, substrate, collection mechanism and enhancement. While it is further theorized that many of these factors (as well as additional confounding factors) are at play during an examination, the manner and extent to which these sources of variability affect the examination of footwear evidence remain unclear. In order to explore this …


Clinical Information Extraction From Unstructured Free-Texts, Mingzhe Tao Jan 2018

Clinical Information Extraction From Unstructured Free-Texts, Mingzhe Tao

Legacy Theses & Dissertations (2009 - 2024)

Information extraction (IE) is a fundamental component of natural language processing (NLP) that provides a deeper understanding of the texts. In the clinical domain, documents prepared by medical experts (e.g., discharge summaries, drug labels, medical history records) contain a significant amount of clinically-relevant information that is crucial to the overall well-being of patients. Unfortunately, in many cases, clinically-relevant information is presented in an unstructured format, predominantly consisting of free-texts, making it inaccessible to computerized methods. Automatic extraction of this information can improve accessibility. However, the presence of synonymous expressions, medical acronyms, misspellings, negated phrases, and ambiguous terminologies make automatic extraction …


So What Are You Going To Do With That? The Promises And Pitfalls Of Massive Data Sets, Sigrid Anderson Cordell, Melissa Gomis Jan 2017

So What Are You Going To Do With That? The Promises And Pitfalls Of Massive Data Sets, Sigrid Anderson Cordell, Melissa Gomis

UNL Libraries: Faculty Publications

This article takes as its case study the challenge of data sets for text mining, sources that offer tremendous promise for digital humanities (DH) methodology but present specific challenges for humanities scholars. These text sets raise a range of issues: What skills do you train humanists to have? What is the library’s role in enabling and supporting use of those materials? How do you allocate staff? Who oversees sustainability and data management? By addressing these questions through a specific use case scenario, this article shows how these questions are central to mapping out future directions for a range of library …


Detection Of Cyberbullying In Sms Messaging, Bryan W. Bradley Jul 2016

Detection Of Cyberbullying In Sms Messaging, Bryan W. Bradley

Computer Science Summer Fellows

Cyberbullying is a type of bullying that uses technology such as cell phones to harass or malign another person. To detect acts of cyberbullying, we are developing an algorithm that will detect cyberbullying in SMS (text) messages. Over 80,000 text messages have been collected by software installed on cell phones carried by participants in our study. This paper describes the development of the algorithm to detect cyberbullying messages, using the cell phone data collected previously. The algorithm works by first separating the messages into conversations in an automated way. The algorithm then analyzes the conversations and scores the severity and …


Mining And Clustering Mobility Evolution Patterns From Social Media For Urban Informatics, Chien-Cheng Chen, Meng-Fen Chiang, Wen-Chih Peng May 2016

Mining And Clustering Mobility Evolution Patterns From Social Media For Urban Informatics, Chien-Cheng Chen, Meng-Fen Chiang, Wen-Chih Peng

Research Collection School Of Computing and Information Systems

In this paper, given a set of check-in data, we aim at discovering representative daily movement behavior of users in a city. For example, daily movement behavior on a weekday may show users moving from one to another spatial region associated with time information. Since check-in data contain both spatial and temporal information, we propose a mobility evolution pattern to capture the daily movement behavior of users in a city. Furthermore, given a set of daily mobility evolution patterns, we formulate their similarity distances and then discover representative mobility evolution patterns via the clustering process. Representative mobility evolution patterns are …


Identifying Terrorist Affiliations Through Social Network Analysis Using Data Mining Techniques, Govand A. Ali Apr 2016

Identifying Terrorist Affiliations Through Social Network Analysis Using Data Mining Techniques, Govand A. Ali

Information Technology Master Theses

In a technologically enabled world, local ideologically inspired warfare becomes global all too quickly, specifically terrorist groups like Al Quaeda and ISIS (Daesh) have successfully used modern computing technology and social networking environments to broadcast their message, recruit new members, and plot attacks. This is especially true for such platforms as Twitter and encrypted mobile apps like Telegram or the clandestine Alrawi. As early detection of such activity is crucial to attack prevention data mining techniques have become increasingly important in the fight against the spread of global terrorist activity. This study employs data mining tools to mine Twitter for …


Novel Machine Learning Methods For Modeling Time-To-Event Data, Bhanukiran Vinzamuri Jan 2016

Novel Machine Learning Methods For Modeling Time-To-Event Data, Bhanukiran Vinzamuri

Wayne State University Dissertations

Predicting time-to-event from longitudinal data where different events occur at different time points is an extremely important problem in several domains such as healthcare, economics, social networks and seismology, to name a few. A unique challenge in this problem involves building predictive models from right censored data (also called as survival data). This is a phenomenon where instances whose event of interest are not yet observed within a given observation time window and are considered to be right censored. Effective models for predicting time-to-event labels from such right censored data with good accuracy can have a significant impact in these …


Capstone Projects Mining System For Insights And Recommendations, Melvrivk Aik Chun Goh, Swapna Gottipati, Venky Shankararaman Dec 2015

Capstone Projects Mining System For Insights And Recommendations, Melvrivk Aik Chun Goh, Swapna Gottipati, Venky Shankararaman

Research Collection School Of Computing and Information Systems

In this paper, we present a classification based system to discover knowledge and trends in higher education students’ projects. Essentially, the educational capstone projects provide an opportunity for students to apply what they have learned and prepare themselves for industry needs. Therefore mining such projects gives insights of students’ experiences as well as industry project requirements and trends. In particular, we mine capstone projects executed by Information Systems students to discover patterns and insights related to people, organization, domain, industry needs and time. We build a capstone projects mining system (CPMS) based on classification models that leverage text mining, natural …


Intelligshop: Enabling Intelligent Shopping In Malls Through Location-Based Augmented Reality, Aditi Adhikari, Vincent W. Zheng, Hong Cao, Miao Lin, Yuan Fang, Kevin Chen-Chuan Chang Nov 2015

Intelligshop: Enabling Intelligent Shopping In Malls Through Location-Based Augmented Reality, Aditi Adhikari, Vincent W. Zheng, Hong Cao, Miao Lin, Yuan Fang, Kevin Chen-Chuan Chang

Research Collection School Of Computing and Information Systems

Shopping experience is important for both citizens and tourists. We present IntelligShop, a novel location-based augmented reality application that supports intelligent shopping experience in malls. As the key functionality, IntelligShop provides an augmented reality interface-people can simply use ubiquitous smartphones to face mall retailers, then IntelligShop will automatically recognize the retailers and fetch their online reviews from various sources (including blogs, forums and publicly accessible social media) to display on the phones. Technically, IntelligShop addresses two challenging data mining problems, including robust feature learning to support heterogeneous smartphones in localization and learning to query for automatically gathering the retailer content …


Should We Use The Sample? Analyzing Datasets Sampled From Twitter's Stream Api, Yazhe Wang, Jamie Callan, Baihua Zheng Jun 2015

Should We Use The Sample? Analyzing Datasets Sampled From Twitter's Stream Api, Yazhe Wang, Jamie Callan, Baihua Zheng

Research Collection School Of Computing and Information Systems

Researchers have begun studying content obtained from microblogging services such as Twitter to address a variety of technological, social, and commercial research questions. The large number of Twitter users and even larger volume of tweets often make it impractical to collect and maintain a complete record of activity; therefore, most research and some commercial software applications rely on samples, often relatively small samples, of Twitter data. For the most part, sample sizes have been based on availability and practical considerations. Relatively little attention has been paid to how well these samples represent the underlying stream of Twitter data. To fill …


Author Topic Model-Based Collaborative Filtering For Personalized Poi Recommendations, Shuhui Jiang, Xueming Qian, Jialie Shen, Yun Fu, Tao Mei Jun 2015

Author Topic Model-Based Collaborative Filtering For Personalized Poi Recommendations, Shuhui Jiang, Xueming Qian, Jialie Shen, Yun Fu, Tao Mei

Research Collection School Of Computing and Information Systems

From social media has emerged continuous needs for automatic travel recommendations. Collaborative filtering (CF) is the most well-known approach. However, existing approaches generally suffer from various weaknesses. For example, sparsity can significantly degrade the performance of traditional CF. If a user only visits very few locations, accurate similar user identification becomes very challenging due to lack of sufficient information for effective inference. Moreover, existing recommendation approaches often ignore rich user information like textual descriptions of photos which can reflect users' travel preferences. The topic model (TM) method is an effective way to solve the "sparsity problem," but is still far …


Twitter Location (Sometimes) Matters: Exploring The Relationship Between Georeferenced Tweet Content And Nearby Feature Classes, Stefan Hahmann, Ross S. Purves, Dirk Burghardt Dec 2014

Twitter Location (Sometimes) Matters: Exploring The Relationship Between Georeferenced Tweet Content And Nearby Feature Classes, Stefan Hahmann, Ross S. Purves, Dirk Burghardt

Journal of Spatial Information Science

In this paper, we investigate whether microblogging texts (tweets) produced on mobile devices are related to the geographical locations where they were posted. For this purpose, we correlate tweet topics to areas. In doing so, classified points of interest from OpenStreetMap serve as validation points. We adopted the classification and geolocation of these points to correlate with tweet content by means of manual, supervised, and unsupervised machine learning approaches. Evaluation showed the manual classification approach to be highest quality, followed by the supervised method, and that the unsupervised classification was of low quality. We found that the degree to which …


Time-Series Data Mining In Transportation: A Case Study On Singapore Public Train Commuter Travel Patterns, Roy Ka Wei Lee, Tin Seong Kam Oct 2014

Time-Series Data Mining In Transportation: A Case Study On Singapore Public Train Commuter Travel Patterns, Roy Ka Wei Lee, Tin Seong Kam

Research Collection School Of Computing and Information Systems

The adoption of smart cards technologies and automated data collection systems (ADCS) in transportation domain had provided public transport planners opportunities to amass a huge and continuously increasing amount of time-series data about the behaviors and travel patterns of commuters. However the explosive growth of temporal related databases has far outpaced the transport planners’ ability to interpret these data using conventional statistical techniques, creating an urgent need for new techniques to support the analyst in transforming the data into actionable information and knowledge. This research study thus explores and discusses the potential use of time-series data mining, a relatively new …


Automated Library Recommendation, Ferdian Thung, David Lo, Julia Lawall Jun 2014

Automated Library Recommendation, Ferdian Thung, David Lo, Julia Lawall

David LO

Many third party libraries are available to be downloaded and used. Using such libraries can reduce development time and make the developed software more reliable. However, developers are often unaware of suitable libraries to be used for their projects and thus they miss out on these benefits. To help developers better take advantage of the available libraries, we propose a new technique that automatically recommends libraries to developers. Our technique takes as input the set of libraries that an application currently uses, and recommends other libraries that are likely to be relevant. We follow a hybrid approach that combines association …


On Predicting User Affiliations Using Social Features In Online Social Networks, Minh Thap Nguyen Mar 2014

On Predicting User Affiliations Using Social Features In Online Social Networks, Minh Thap Nguyen

Dissertations and Theses Collection (Open Access)

User profiling such as user affiliation prediction in online social network is a challenging task, with many important applications in targeted marketing and personalized recommendation. The research task here is to predict some user affiliation attributes that suggest user participation in different social groups.


Guiding Data-Driven Transportation Decisions, Kristin A. Tufte, Basem Elazzabi, Nathan Hall, Morgan Harvey, Kath Knobe, David Maier, Veronika Margaret Megler Jan 2014

Guiding Data-Driven Transportation Decisions, Kristin A. Tufte, Basem Elazzabi, Nathan Hall, Morgan Harvey, Kath Knobe, David Maier, Veronika Margaret Megler

Computer Science Faculty Publications and Presentations

Urban transportation professionals are under increasing pressure to perform data-driven decision making and to provide data-driven performance metrics. This pressure comes from sources including the federal government and is driven, in part, by the increased volume and variety of transportation data available. This sudden increase of data is partially a result of improved technology for sensors and mobile devices as well as reduced device and storage costs. However, using this proliferation of data for decisions and performance metrics is proving to be difficult. In this paper, we describe a proposed structure for a system to support data-driven decision making. A …


Hot Zone Identification: Analyzing Effects Of Data Sampling On Spam Clustering, Rasib Khan, Mainul Mizan, Ragib Hasan, Alan Sprague Jan 2014

Hot Zone Identification: Analyzing Effects Of Data Sampling On Spam Clustering, Rasib Khan, Mainul Mizan, Ragib Hasan, Alan Sprague

Journal of Digital Forensics, Security and Law

Email is the most common and comparatively the most efficient means of exchanging information in today's world. However, given the widespread use of emails in all sectors, they have been the target of spammers since the beginning. Filtering spam emails has now led to critical actions such as forensic activities based on mining spam email. The data mine for spam emails at the University of Alabama at Birmingham is considered to be one of the most prominent resources for mining and identifying spam sources. It is a widely researched repository used by researchers from different global organizations. The usual process …