Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Engineering

PDF

Electronic Theses and Dissertations

Theses/Dissertations

Data mining

Articles 1 - 17 of 17

Full-Text Articles in Entire DC Network

A Two-Stage Approach To Ridesharing Assignment And Auction In A Crowdsourcing Collaborative Transportation Platform., Peiyu Luo May 2019

A Two-Stage Approach To Ridesharing Assignment And Auction In A Crowdsourcing Collaborative Transportation Platform., Peiyu Luo

Electronic Theses and Dissertations

Collaborative transportation platforms have emerged as an innovative way for firms and individuals to meet their transportation needs through using services from external profit-seeking drivers. A number of collaborative transportation platforms (such as Uber, Lyft, and MyDHL) arise to facilitate such delivery requests in recent years. A particular collaborative transportation platform usually provides a two sided marketplace with one set of members (service seekers or passengers) posting tasks, and the another set of members (service providers or drivers) accepting on these tasks and providing services. As the collaborative transportation platform attracts more service seekers and providers, the number of open …


Horse Racing Prediction Using Graph-Based Features., Mehmet Akif Gulum May 2018

Horse Racing Prediction Using Graph-Based Features., Mehmet Akif Gulum

Electronic Theses and Dissertations

This thesis presents an applied horse racing prediction using graph based features on a set of horse races data. We used artificial neural network and logistic regression models to train then test to prediction without graph based features and with graph based features. This thesis can be explained in 4 main parts. Collect data from a horse racing website held from 2015 to 2017. Train data to using predictive models and make a prediction. Create a global directed graph of horses and extract graph-based features (Core Part) . Add graph based features to basic features and train to using same …


Maintainability Analysis Of Mining Trucks With Data Analytics., Abdulgani Kahraman May 2018

Maintainability Analysis Of Mining Trucks With Data Analytics., Abdulgani Kahraman

Electronic Theses and Dissertations

The mining industry is one of the biggest industries in need of a large budget, and current changes in global economic challenges force the industry to reduce its production expenses. One of the biggest expenditures is maintenance. Thanks to the data mining techniques, available historical records of machines’ alarms and signals might be used to predict machine failures. This is crucial because repairing machines after failures is not as efficient as utilizing predictive maintenance. In this case study, the reasons for failures seem to be related to the order of signals or alarms, called events, which come from trucks. The …


Peeking Into The Other Half Of The Glass : Handling Polarization In Recommender Systems., Mahsa Badami May 2017

Peeking Into The Other Half Of The Glass : Handling Polarization In Recommender Systems., Mahsa Badami

Electronic Theses and Dissertations

This dissertation is about filtering and discovering information online while using recommender systems. In the first part of our research, we study the phenomenon of polarization and its impact on filtering and discovering information. Polarization is a social phenomenon, with serious consequences, in real-life, particularly on social media. Thus it is important to understand how machine learning algorithms, especially recommender systems, behave in polarized environments. We study polarization within the context of the users' interactions with a space of items and how this affects recommender systems. We first formalize the concept of polarization based on item ratings and then relate …


Study Of Spatiotemporal Rainfall Structure And Optimized Local Radar Rainfall Application To Urban Watershed, Louisville, Kentucky, 2010-2014., Jin-Young Hyun Dec 2016

Study Of Spatiotemporal Rainfall Structure And Optimized Local Radar Rainfall Application To Urban Watershed, Louisville, Kentucky, 2010-2014., Jin-Young Hyun

Electronic Theses and Dissertations

In urban areas, a prevalence of combined sewer systems (CSS) exist that carry both storm water runoff and sanitary sewer flows in a single pipe, these system are considered combined sewers. In the absence of rainfall-runoff most of these systems function adequately, however CSS capacity is typically inadequate to carry peak stormwater runoff volume. In order to minimize sewage flooding into streets and backups into homes and businesses, most CSSs (as well as separate sanitary sewer systems) are designed to overflow into surface waters such as streams and rivers, lakes and seas. This occurrence is considered a combined sewer overflow …


Computational Methods To Predict And Enhance Decision-Making With Biomedical Data., Behnaz Abdollahi May 2015

Computational Methods To Predict And Enhance Decision-Making With Biomedical Data., Behnaz Abdollahi

Electronic Theses and Dissertations

The proposed research applies machine learning techniques to healthcare applications. The core ideas were using intelligent techniques to find automatic methods to analyze healthcare applications. Different classification and feature extraction techniques on various clinical datasets are applied. The datasets include: brain MR images, breathing curves from vessels around tumor cells during in time, breathing curves extracted from patients with successful or rejected lung transplants, and lung cancer patients diagnosed in US from in 2004-2009 extracted from SEER database. The novel idea on brain MR images segmentation is to develop a multi-scale technique to segment blood vessel tissues from similar tissues …


Text Stylometry For Chat Bot Identification And Intelligence Estimation., Nawaf Ali May 2014

Text Stylometry For Chat Bot Identification And Intelligence Estimation., Nawaf Ali

Electronic Theses and Dissertations

Authorship identification is a technique used to identify the author of an unclaimed document, by attempting to find traits that will match those of the original author. Authorship identification has a great potential for applications in forensics. It can also be used in identifying chat bots, a form of intelligent software created to mimic the human conversations, by their unique style. The online criminal community is utilizing chat bots as a new way to steal private information and commit fraud and identity theft. The need for identifying chat bots by their style is becoming essential to overcome the danger of …


An Unsupervised Consensus Control Chart Pattern Recognition Framework, Siavash Haghtalab Jan 2014

An Unsupervised Consensus Control Chart Pattern Recognition Framework, Siavash Haghtalab

Electronic Theses and Dissertations

Early identification and detection of abnormal time series patterns is vital for a number of manufacturing. Slide shifts and alterations of time series patterns might be indicative of some anomaly in the production process, such as machinery malfunction. Usually due to the continuous flow of data monitoring of manufacturing processes requires automated Control Chart Pattern Recognition(CCPR) algorithms. The majority of CCPR literature consists of supervised classification algorithms. Less studies consider unsupervised versions of the problem. Despite the profound advantage of unsupervised methodology for less manual data labeling their use is limited due to the fact that their performance is not …


Learning Collective Behavior In Multi-Relational Networks, Xi Wang Jan 2014

Learning Collective Behavior In Multi-Relational Networks, Xi Wang

Electronic Theses and Dissertations

With the rapid expansion of the Internet and WWW, the problem of analyzing social media data has received an increasing amount of attention in the past decade. The boom in social media platforms offers many possibilities to study human collective behavior and interactions on an unprecedented scale. In the past, much work has been done on the problem of learning from networked data with homogeneous topologies, where instances are explicitly or implicitly inter-connected by a single type of relationship. In contrast to traditional content-only classification methods, relational learning succeeds in improving classification performance by leveraging the correlation of the labels …


Integrated Data Fusion And Mining (Idfm) Technique For Monitoring Water Quality In Large And Small Lakes, Benjamin Vannah Jan 2013

Integrated Data Fusion And Mining (Idfm) Technique For Monitoring Water Quality In Large And Small Lakes, Benjamin Vannah

Electronic Theses and Dissertations

Monitoring water quality on a near-real-time basis to address water resources management and public health concerns in coupled natural systems and the built environment is by no means an easy task. Furthermore, this emerging societal challenge will continue to grow, due to the ever-increasing anthropogenic impacts upon surface waters. For example, urban growth and agricultural operations have led to an influx of nutrients into surface waters stimulating harmful algal bloom formation, and stormwater runoff from urban areas contributes to the accumulation of total organic carbon (TOC) in surface waters. TOC in surface waters is a known precursor of disinfection byproducts …


Multi-Level Safety Performance Functions For High Speed Facilities, Mohamed Ahmed Jan 2012

Multi-Level Safety Performance Functions For High Speed Facilities, Mohamed Ahmed

Electronic Theses and Dissertations

High speed facilities are considered the backbone of any successful transportation system; Interstates, freeways, and expressways carry the majority of daily trips on the transportation network. Although these types of roads are relatively considered the safest among other types of roads, they still experience many crashes, many of which are severe, which not only affect human lives but also can have tremendous economical and social impacts. These facts signify the necessity of enhancing the safety of these high speed facilities to ensure better and efficient operation. Safety problems could be assessed through several approaches that can help in mitigating the …


A Study Of Factors Contributing To Self-Reported Anomalies In Civil Aviation, Chris Andrzejczak Jan 2010

A Study Of Factors Contributing To Self-Reported Anomalies In Civil Aviation, Chris Andrzejczak

Electronic Theses and Dissertations

A study investigating what factors are present leading to pilots submitting voluntary anomaly reports regarding their flight performance was conducted. The study employed statistical methods, text mining, clustering, and dimensional reduction techniques in an effort to determine relationships between factors and anomalies. A review of the literature was conducted to determine what factors are contributing to these anomalous incidents, as well as what research exists on human error, its causes, and its management. Data from the NASA Aviation Safety Reporting System (ASRS) was analyzed using traditional statistical methods such as frequencies and multinomial logistic regression. Recently formalized approaches in text …


Detecting Malicious Software By Dynamicexecution, Jianyong Dai Jan 2009

Detecting Malicious Software By Dynamicexecution, Jianyong Dai

Electronic Theses and Dissertations

Traditional way to detect malicious software is based on signature matching. However, signature matching only detects known malicious software. In order to detect unknown malicious software, it is necessary to analyze the software for its impact on the system when the software is executed. In one approach, the software code can be statically analyzed for any malicious patterns. Another approach is to execute the program and determine the nature of the program dynamically. Since the execution of malicious code may have negative impact on the system, the code must be executed in a controlled environment. For that purpose, we have …


Multivariate Discretization Of Continuous Valued Attributes., Ehab Ahmed El Sayed Ahmed 1978- Dec 2006

Multivariate Discretization Of Continuous Valued Attributes., Ehab Ahmed El Sayed Ahmed 1978-

Electronic Theses and Dissertations

The area of Knowledge discovery and data mining is growing rapidly. Feature Discretization is a crucial issue in Knowledge Discovery in Databases (KDD), or Data Mining because most data sets used in real world applications have features with continuously values. Discretization is performed as a preprocessing step of the data mining to make data mining techniques useful for these data sets. This thesis addresses discretization issue by proposing a multivariate discretization (MVD) algorithm. It begins withal number of common discretization algorithms like Equal width discretization, Equal frequency discretization, Naïve; Entropy based discretization, Chi square discretization, and orthogonal hyper planes. After …


Estimation Of Hybrid Models For Real-Time Crash Risk Assessment On Freeways, Anurag Pande Jan 2005

Estimation Of Hybrid Models For Real-Time Crash Risk Assessment On Freeways, Anurag Pande

Electronic Theses and Dissertations

Relevance of reactive traffic management strategies such as freeway incident detection has been diminishing with advancements in mobile phone usage and video surveillance technology. On the other hand, capacity to collect, store, and analyze traffic data from underground loop detectors has witnessed enormous growth in the recent past. These two facts together provide us with motivation as well as the means to shift the focus of freeway traffic management toward proactive strategies that would involve anticipating incidents such as crashes. The primary element of proactive traffic management strategy would be model(s) that can separate 'crash prone' conditions from 'normal' traffic …


High Performance Data Mining Techniques For Intrusion Detection, Muazzam Ahmed Siddiqui Jan 2004

High Performance Data Mining Techniques For Intrusion Detection, Muazzam Ahmed Siddiqui

Electronic Theses and Dissertations

The rapid growth of computers transformed the way in which information and data was stored. With this new paradigm of data access, comes the threat of this information being exposed to unauthorized and unintended users. Many systems have been developed which scrutinize the data for a deviation from the normal behavior of a user or system, or search for a known signature within the data. These systems are termed as Intrusion Detection Systems (IDS). These systems employ different techniques varying from statistical methods to machine learning algorithms. Intrusion detection systems use audit data generated by operating systems, application softwares or …


Modifications To The Fuzzy-Artmap Algorithm For Distributed Learning In Large Data Sets, Jose R. Castro Jan 2004

Modifications To The Fuzzy-Artmap Algorithm For Distributed Learning In Large Data Sets, Jose R. Castro

Electronic Theses and Dissertations

The Fuzzy–ARTMAP (FAM) algorithm has been proven to be one of the premier neural network architectures for classification problems. FAM can learn on line and is usually faster than other neural network approaches. Nevertheless the learning time of FAM can slow down considerably when the size of the training set increases into the hundreds of thousands. In this dissertation we apply data partitioning and network partitioning to the FAM algorithm in a sequential and parallel setting to achieve better convergence time and to efficiently train with large databases (hundreds of thousands of patterns). We implement our parallelization on a Beowulf …