Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 45

Full-Text Articles in Physical Sciences and Mathematics

Analyzing Taxi Drivers’ Decision-Making And Recommending Strategies For Enhanced Performance: A Data-Driven Approach, Mengyu Ji Jul 2023

Analyzing Taxi Drivers’ Decision-Making And Recommending Strategies For Enhanced Performance: A Data-Driven Approach, Mengyu Ji

Dissertations and Theses Collection (Open Access)

This thesis focuses on analyzing the decision-making process of taxi drivers and providing data-driven strategies to enhance their performance. By examin- ing comprehensive historical data encompassing passenger demand patterns, drivers’ spatial dynamics, and fare structures, valuable insights are gained into drivers’ choices regarding optimal routes, timing, and areas with high demand. Integrating real-time information sources, such as GPS data and passenger updates, allows drivers to adapt their strategies dynamically to changing traffic conditions and emerging demand patterns. Predictive analytics models, includ- ing ARIMA, XGBoost, and Linear Regression, are utilized to forecast demand flow at key locations, enabling proactive decision-making and …


Connecting The Dots For Contextual Information Retrieval, Pei-Chi Lo May 2023

Connecting The Dots For Contextual Information Retrieval, Pei-Chi Lo

Dissertations and Theses Collection (Open Access)

There are many information retrieval tasks that depend on knowledge graphs to return contextually relevant result of the query. We call them Knowledgeenriched Contextual Information Retrieval (KCIR) tasks and these tasks come in many different forms including query-based document retrieval, query answering and others. These KCIR tasks often require the input query to contextualized by additional facts from a knowledge graph, and using the context representation to perform document or knowledge graph retrieval and prediction. In this dissertation, we present a meta-framework that identifies Contextual Representation Learning (CRL) and Contextual Information Retrieval (CIR) to be the two key components in …


A Study Of The Impact Of Data Intelligence On Software Delivery Performance, Yongdong Dong Mar 2023

A Study Of The Impact Of Data Intelligence On Software Delivery Performance, Yongdong Dong

Dissertations and Theses Collection (Open Access)

With the rise of big data and artificial intelligence, data intelligence has gradually become the focus of academia and industry. Data intelligence has two obvious characteristics: big data drive and application scene drive. More and more enterprises extract valuable patterns contained in data with prediction and decision analysis methods and technologies such as large-scale data mining, machine learning and deep learning and use them to improve the management and decision in complex practice, so as to promote changes of new business modes, organizational structures and even business strategies, and improve the operational efficiency of organizations. However, there are few studies …


Mining Product Textual Data For Recommendation Explanations, Le Trung Hoang Nov 2022

Mining Product Textual Data For Recommendation Explanations, Le Trung Hoang

Dissertations and Theses Collection (Open Access)

Recommendation explanations help to make sense of recommendations, increasing the likelihood of adoption. Here, we are interested in mining product textual data, an unstructured data type, coming from manufacturers, sellers, or consumers, appearing in many places including title, summary, description, review, question and answers, etc., can be a rich source of information to explain the recommendation. As the explanation task could be decoupled from that of recommendation objective, we can categorize recommendation explanation into integrated approach, that uses a single interpretable model to produce both recommendation and explanation, or pipeline approach, that uses a post-hoc explanation model to produce explanation …


Robustness And Cross-Lingual Transfer: An Exploration Of Out-Of-Distribution Scenario In Natural Language Processing, Yu, Sicheng Sep 2022

Robustness And Cross-Lingual Transfer: An Exploration Of Out-Of-Distribution Scenario In Natural Language Processing, Yu, Sicheng

Dissertations and Theses Collection (Open Access)

Most traditional machine learning or deep learning methods are based on the premise that training data and test data are independent and identical distributed, i.e., IID. However, it is just an ideal situation. In real-world applications, test set and training data often follow different distributions, which we refer to as the out of distribution, i.e., OOD, setting. As a result, models trained with traditional methods always suffer from an undesirable performance drop on the OOD test set. It's necessary to develop techniques to solve this problem for real applications. In this dissertation, we present four pieces of work in the …


Finding Top-M Leading Records In Temporal Data, Yiyi Wang Jul 2022

Finding Top-M Leading Records In Temporal Data, Yiyi Wang

Dissertations and Theses Collection (Open Access)

A traditional top-k query retrieves the records that stand out at a certain point in time. On the other hand, a durable top-k query considers how long the records retain their supremacy, i.e., it reports those records that are consistently among the top-k in a given time interval. In this thesis, we introduce a new query to the family of durable top-k formulations. It finds the top-m leading records, i.e., those that rank among the top-k for the longest duration within the query interval. Practically, this query assesses the records based on how long …


Chinese Idiom Understanding With Transformer-Based Pretrained Language Models, Minghuan Tan May 2022

Chinese Idiom Understanding With Transformer-Based Pretrained Language Models, Minghuan Tan

Dissertations and Theses Collection (Open Access)


In this dissertation, I study the understanding of Chinese idioms using transformer-based pretrained language models. By ``understanding", I confine the topics to word embeddings learning, contextualized word representations learning, multiple-choice cloze-test reading comprehension and conditional text generation. Chinese idioms are fixed phrases that have special meanings usually derived from an ancient story. The meanings of these idioms are oftentimes not directly related to their component characters, which makes it hard to model them compared with standard phrases whose meanings are compositional. We initiate the work with studying idiom representations derived from pretrained language models, in particular, BERT. We adopt probing-based …


Modeling Sentiments And Preferences From Multimodal Data, Quoc Tuan Truong Feb 2022

Modeling Sentiments And Preferences From Multimodal Data, Quoc Tuan Truong

Dissertations and Theses Collection (Open Access)

Online reviews are prevalent in many modern Web applications, such as e-commerce, crowd-sourced location and check-in platforms. Fueled by the rise of mobile phones that are often the only cameras on hand, reviews are increasingly multimodal, with photos in addition to textual content. In this thesis, we focus on modeling the subjectivity carried in this form of data, with two research objectives.

In the first part, we tackle the problem of detecting sentiment expressed by a review. This is a key unlocking many applications, e.g., analyzing opinions, monitoring consumer satisfaction, assessing product quality.
Traditionally, the task of sentiment analysis primarily …


Deep Learning For Video-Grounded Dialogue Systems, Hung Le Jan 2022

Deep Learning For Video-Grounded Dialogue Systems, Hung Le

Dissertations and Theses Collection (Open Access)

In recent years, we have witnessed significant progress in building systems with artificial intelligence. However, despite advancements in machine learning and deep learning, we are still far from achieving autonomous agents that can perceive multi-dimensional information from the surrounding world and converse with humans in natural language. Towards this goal, this thesis is dedicated to building intelligent systems in the task of video-grounded dialogues. Specifically, in a video-grounded dialogue, a system is required to hold a multi-turn conversation with humans about the content of a video. Given an input video, a dialogue history, and a question about the video, the …


The Effects Of Recommender System On Sales Promotion Of High-Value Products: Evidence From A Field Experiment In The Real Estate Industry, Lian Liu Jan 2022

The Effects Of Recommender System On Sales Promotion Of High-Value Products: Evidence From A Field Experiment In The Real Estate Industry, Lian Liu

Dissertations and Theses Collection (Open Access)

Real estate sales industry in China has long suffered the problem of inefficient matching of customers to projects. Inspired by the design of recommender systems, which have been widely used in the online retail industry, and are shown to facility customer-product matching and improve sales, we apply this system to the real estate sales industry using a novel approach. Instead of recommending products to customers, we suggest the best potential customers to salespeople with whom they will conduct sales with. Using city-wide sales data from the largest real estate sales company in China, we first develop a recommend system based …


Can We Make It Better? Assessing And Improving Quality Of Github Repositories, Gede Artha Azriadi Prana Nov 2021

Can We Make It Better? Assessing And Improving Quality Of Github Repositories, Gede Artha Azriadi Prana

Dissertations and Theses Collection (Open Access)

The code hosting platform GitHub has gained immense popularity worldwide in recent years, with over 200 million repositories hosted as of June 2021. Due to its popularity, it has great potential to facilitate widespread improvements across many software projects. Naturally, GitHub has attracted much research attention, and the source code in the various repositories it hosts also provide opportunity to apply techniques and tools developed by software engineering researchers over the years. However, much of existing body of research applicable to GitHub focuses on code quality of the software projects and ways to improve them. Fewer work focus on potential …


Novel Techniques In Recovering, Embedding, And Enforcing Policies For Control-Flow Integrity, Yan Lin Jan 2021

Novel Techniques In Recovering, Embedding, And Enforcing Policies For Control-Flow Integrity, Yan Lin

Dissertations and Theses Collection (Open Access)

Control-Flow Integrity (CFI) is an attractive security property with which most injected and code-reuse attacks can be defeated, including advanced attacking techniques like Return-Oriented Programming (ROP). CFI extracts a control-flow graph (CFG) for a given program and instruments the program to respect the CFG. Specifically, checks are inserted before indirect branch instructions. Before these instructions are executed during runtime, the checks consult the CFG to ensure that the indirect branch is allowed to reach the intended target. Hence, any sort of controlflow hijacking would be prevented. There are three fundamental components in CFI enforcement. The first component is accurately recovering …


Vision-Based Analytics For Improved Ai-Driven Iot Applications, Amit Sharma Dec 2020

Vision-Based Analytics For Improved Ai-Driven Iot Applications, Amit Sharma

Dissertations and Theses Collection (Open Access)

Proliferation of Internet of Things (IoT) sensor systems, primarily driven by cheaper embedded hardware platforms and wide availability of light-weight software platforms, has opened up doors for large-scale data collection opportunities. The availability of massive amount of data has in-turn given way to rapidly growing machine learning models e.g. You Only Look Once (YOLO), Single-Shot-Detectors (SSD) and so on. There has been a growing trend of applying machine learning techniques, e.g., object detection, image classification, face detection etc., on data collected from camera sensors and therefore enabling plethora of vision-sensing applications namely self-driving cars, automatic crowd monitoring, traffic-flow analysis, occupancy …


Deep Learning For Real-World Object Detection, Xiongwei Wu Jul 2020

Deep Learning For Real-World Object Detection, Xiongwei Wu

Dissertations and Theses Collection (Open Access)

Despite achieving significant progresses, most existing detectors are designed to detect objects in academic contexts but consider little in real-world scenarios. In real-world applications, the scale variance of objects can be significantly higher than objects in academic contexts; In addition, existing methods are designed for achieving localization with relatively low precision, however more precise localization is demanded in real-world scenarios; Existing methods are optimized with huge amount of annotated data, but in certain real-world scenarios, only a few samples are available. In this dissertation, we aim to explore novel techniques to address these research challenges to make object detection algorithms …


Using Knowledge Bases For Question Answering, Yunshi Lan Mar 2020

Using Knowledge Bases For Question Answering, Yunshi Lan

Dissertations and Theses Collection (Open Access)

A knowledge base (KB) is a well-structured database, which contains many of entities and their relations. With the fast development of large-scale knowledge bases such as Freebase, DBpedia and YAGO, knowledge bases have become an important resource, which can serve many applications, such as dialogue system, textual entailment, question answering and so on. These applications play significant roles in real-world industry.

In this dissertation, we try to explore the entailment information and more general entity-relation information from the KBs. Recognizing textual entailment (RTE) is a task to infer the entailment relations between sentences. We need to decide whether a hypothesis …


Modeling Sequential And Basket-Oriented Associations For Top-K Recommendation, Duc-Trong Le Duc Trong Apr 2019

Modeling Sequential And Basket-Oriented Associations For Top-K Recommendation, Duc-Trong Le Duc Trong

Dissertations and Theses Collection (Open Access)

Top-K recommendation is a typical task in Recommender Systems. In traditional approaches, it mainly relies on the modeling of user-item associations, which emphasizes the user-specific factor or personalization. Here, we investigate another direction that models item-item associations, especially with the notions of sequence-aware and basket-level adoptions . Sequences are created by sorting item adoptions chronologically. The associations between items along sequences, referred to as “sequential associations”, indicate the influence of the preceding adoptions on the following adoptions. Considering a basket of items consumed at the same time step (e.g., a session, a day), “basket-oriented associations” imply correlative dependencies among these …


Question Answering With Textual Sequence Matching, Shuohang Wang Apr 2019

Question Answering With Textual Sequence Matching, Shuohang Wang

Dissertations and Theses Collection (Open Access)

Question answering (QA) is one of the most important applications in natural language processing. With the explosive text data from the Internet, intelligently getting answers of questions will help humans more efficiently collect useful information. My research in this thesis mainly focuses on solving question answering problem with textual sequence matching model which is to build vectorized representations for pairs of text sequences to enable better reasoning. And our thesis consists of three major parts.

In Part I, we propose two general models for building vectorized representations over a pair of sentences, which can be directly used to solve the …


Comparison Mining From Text, Maksim Tkachenko Dec 2018

Comparison Mining From Text, Maksim Tkachenko

Dissertations and Theses Collection (Open Access)

Online product reviews are important factors of consumers' purchase decisions. They invade more and more spheres of our life, we have reviews on books, electronics, groceries, entertainments, restaurants, travel experiences, etc. More than 90 percent of consumers read online reviews before they purchase products as reported by various consumers surveys. This observation suggests that product review information enhances consumer experience and helps them to make better-informed purchase decisions. There is an enormous amount of online reviews posted on e-commerce platforms, such as Amazon, Apple, Yelp, TripAdvisor. They vary in information and may be written with different experiences and preferences.

If …


Proactive And Reactive Resource/Task Allocation For Agent Teams In Uncertain Environments, Pritee Agrawal Aug 2018

Proactive And Reactive Resource/Task Allocation For Agent Teams In Uncertain Environments, Pritee Agrawal

Dissertations and Theses Collection (Open Access)

Synergistic interactions between task/resource allocation and multi-agent coordinated planning/assignment exist in many problem domains such as trans- portation and logistics, disaster rescue, security patrolling, sensor networks, power distribution networks, etc. These domains often feature dynamic environments where allocations of tasks/resources may have complex dependencies and agents may leave the team due to unforeseen conditions (e.g., emergency, accident or violation, damage to agent, reconfiguration of environment).


Secure Enforcement Of Isolation Policy On Multicore Platforms With Virtualization Techniques, Siqi Zhao Aug 2018

Secure Enforcement Of Isolation Policy On Multicore Platforms With Virtualization Techniques, Siqi Zhao

Dissertations and Theses Collection (Open Access)

A number of virtualization based systems have been proposed in the literature as an effective measure against the adversaries with the kernel privilege. However, under a systematic analysis, such systems exhibit vulnerabilities that can still be exploited by such an attacker with the kernel privilege. The fundamental reason is that there is an inherent incompatibility between the tamper-proof requirement and the complete mediation requirement of the reference monitor model. The incompatibility manifests in the virtualization based systems in the form of a discrepancy between the enforcement capability demanded by the high-level policy and the one achievable through the system design …


Context Recovery In Location-Based Social Networks, Wen Haw Chong Jul 2018

Context Recovery In Location-Based Social Networks, Wen Haw Chong

Dissertations and Theses Collection (Open Access)

This dissertation addresses context recovery in Location-Based Social Networks (LBSN), which are platforms where users post content from various locations. With this general LBSN definition, many existing social media platforms that support user-generated location relevant content using mobile devices could also qualify as LBSNs. Context recovery for such user posts refers to recovering the venue and the semantic contexts of these user posts. Such information is useful for user profiling and to support various applications such as venue recommendation and location- based advertising.


Entity Summarization Of Reviews And Micro-Reviews, Thanh Son Nguyen May 2018

Entity Summarization Of Reviews And Micro-Reviews, Thanh Son Nguyen

Dissertations and Theses Collection (Open Access)

Along with the regular review content, there is a new type of user-generated content arising from the prevalence of mobile devices and social media, that is micro-review. Micro-reviews are bite-size reviews (usually under 200 char- acters), commonly posted on social media or check-in services, using a mobile device. They capture the immediate reaction of users, and they are rich in information, concise, and to the point. Both reviews and micro-reviews are useful for users to get to know the entity of interest, thus facilitating users in making their decision of purchasing or dining. However, the abundant number of both reviews …


Recommending Apis For Software Evolution, Ferdian Thung May 2018

Recommending Apis For Software Evolution, Ferdian Thung

Dissertations and Theses Collection (Open Access)

Softwares are constantly evolving. This evolution has been made easier through the use of Application Programming Interfaces (APIs). By leveraging APIs, developers reuse previously implemented functionalities and concentrate on writing new codes. These APIs may originate from either third parties or internally from other compo- nents of the software that are currently developed. In the first case, developers need to know how to find and use third party APIs. In the second case, developers need to be aware of internal APIs in their own software. In either case, there is often too much information to digest. For instance, finding the …


Automatic Vulnerability Detection And Repair, Siqi Ma May 2018

Automatic Vulnerability Detection And Repair, Siqi Ma

Dissertations and Theses Collection (Open Access)

Vulnerability becomes a major threat to the security of many systems, including computer systems (e.g., Windows and Linux) and mobile systems (e.g., Android and iOS). Attackers can steal private information and perform harmful actions by exploiting unpatched vulnerabilities. Vulnerabilities often remain undetected for a long time as they may not affect the typical functionalities of systems. Thus, it is important to detect and repair a vulnerability in time. However, it is often difficult for a developer to detect and repair a vulnerability correctly and timely if he/she is not a security expert. Fortunately, automatic repair approaches significantly assist developers to …


Proactive Sequential Resource (Re)Distribution For Improving Efficiency In Urban Environments, Supriyo Ghosh Dec 2017

Proactive Sequential Resource (Re)Distribution For Improving Efficiency In Urban Environments, Supriyo Ghosh

Dissertations and Theses Collection (Open Access)

Due to the increasing population and lack of coordination, there is a mismatch in supply and demand of common resources (e.g., shared bikes, ambulances, taxis) in urban environments, which has deteriorated a wide variety of quality of life metrics such as success rate in issuing shared bikes, response times for emergency needs, waiting times in queues etc. Thus, in my thesis, I propose efficient algorithms that optimise the quality of life metrics by proactively redistributing the resources using intelligent operational (day-to-day) and strategic (long-term) decisions in the context of urban transportation and health & safety. For urban transportation, Bike Sharing …


Online Learning With Nonlinear Models, Doyen Sahoo Dec 2017

Online Learning With Nonlinear Models, Doyen Sahoo

Dissertations and Theses Collection (Open Access)

Recent years have witnessed the success of two broad categories of machine learning algorithms: (i) Online Learning; and (ii) Learning with nonlinear models. Typical machine learning algorithms assume that the entire data is available prior to the training task. This is often not the case in the real world, where data often arrives sequentially in a stream, or is too large to be stored in memory. To address these challenges, Online Learning techniques evolved as a promising solution to having highly scalable and efficient learning methodologies which could learn from data arriving sequentially. Next, as the real world data exhibited …


Policy Analytics For Environmental Sustainability: Household Hazardous Waste And Water Impacts Of Carbon Pollution Standards, Kustini Dec 2017

Policy Analytics For Environmental Sustainability: Household Hazardous Waste And Water Impacts Of Carbon Pollution Standards, Kustini

Dissertations and Theses Collection (Open Access)

Policy analytics are essential in supporting more informed policy-making in environmental management. This dissertation employs a fusion of machine methods and explanatory empiricism that involves data analytics, math programming, optimization, econometrics, geospatial and spatiotemporal analysis, and other approaches for assessing and evaluating current and future environmental policies.
Essay 1 discusses household informedness and its impact on the collection and recycling of household hazardous waste (HHW). Household informedness is the degree to which households have the necessary information to make utility-maximizing decisions about the handling of their waste. Such informedness seems to be influenced by HHW public education and environmental quality …


Scalable Online Kernel Learning, Jing Lu Nov 2017

Scalable Online Kernel Learning, Jing Lu

Dissertations and Theses Collection (Open Access)

One critical deficiency of traditional online kernel learning methods is their increasing and unbounded number of support vectors (SV’s), making them inefficient and non-scalable for large-scale applications. Recent studies on budget online learning have attempted to overcome this shortcoming by bounding the number of SV’s. Despite being extensively studied, budget algorithms usually suffer from several drawbacks.
First of all, although existing algorithms attempt to bound the number of SV’s at each iteration, most of them fail to bound the number of SV’s for the final averaged classifier, which is commonly used for online-to-batch conversion. To solve this problem, we propose …


An Integrated Framework For Modeling And Predicting Spatiotemporal Phenomena In Urban Environments, Tuc Viet Le Nov 2017

An Integrated Framework For Modeling And Predicting Spatiotemporal Phenomena In Urban Environments, Tuc Viet Le

Dissertations and Theses Collection (Open Access)

This thesis proposes a general solution framework that integrates methods in machine learning in creative ways to solve a diverse set of problems arising in urban environments. It particularly focuses on modeling spatiotemporal data for the purpose of predicting urban phenomena. Concretely, the framework is applied to solve three specific real-world problems: human mobility prediction, trac speed prediction and incident prediction. For human mobility prediction, I use visitor trajectories collected a large theme park in Singapore as a simplified microcosm of an urban area. A trajectory is an ordered sequence of attraction visits and corresponding timestamps produced by a visitor. …


User Behavior Mining In Microblogging, Tuan Anh Hoang Jun 2016

User Behavior Mining In Microblogging, Tuan Anh Hoang

Dissertations and Theses Collection (Open Access)

This dissertation addresses the modeling of factors concerning microblogging users' content and behavior. We focus on two sets of factors. The first set includes behavioral factors of users and content items driving content propagation in microblogging. The second set consists of latent topics and communities of users as the users are engaged in content generation and behavior adoptions. These two sets of factors are extremely important in many applications, e.g., network monitoring and recommender systems. In the first part of this dissertation, we identify user virality, user susceptibility, and content virality as three behavioral factors that affect users' behaviors in …