Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 16 of 16

Full-Text Articles in Programming Languages and Compilers

R2f: A General Retrieval, Reading And Fusion Framework For Document-Level Natural Language Inference, Hao Wang, Yixin Cao, Yangguang Li, Zhen Huang, Kun Wang, Jing Shao Dec 2022

R2f: A General Retrieval, Reading And Fusion Framework For Document-Level Natural Language Inference, Hao Wang, Yixin Cao, Yangguang Li, Zhen Huang, Kun Wang, Jing Shao

Research Collection School Of Computing and Information Systems

Document-level natural language inference (DocNLI) is a new challenging task in natural language processing, aiming at judging the entailment relationship between a pair of hypothesis and premise documents. Current datasets and baselines largely follow sentence-level settings, but fail to address the issues raised by longer documents. In this paper, we establish a general solution, named Retrieval, Reading and Fusion (R2F) framework, and a new setting, by analyzing the main challenges of DocNLI: interpretability, long-range dependency, and cross-sentence inference. The basic idea of the framework is to simplify document-level task into a set of sentence-level tasks, and improve both performance and …


Using Landsat Satellite Imagery To Estimate Groundcover In The Grainbelt Of Western Australia, Justin Laycock, Nick Middleton, Karen Holmes Dec 2022

Using Landsat Satellite Imagery To Estimate Groundcover In The Grainbelt Of Western Australia, Justin Laycock, Nick Middleton, Karen Holmes

Resource management technical reports

Maintaining vegetative groundcover is an important component of sustainable agricultural systems and plays a critical function for soil and land conservation in Western Australia’s (WA) grainbelt (the south-west cropping region). This report describes how satellite imagery can be used to quantitatively and objectively estimate total vegetative groundcover, both in near real time and historically across large areas. We used the Landsat seasonal fractional groundcover products developed by the Joint Remote Sensing Research Program from the extensive archive of Landsat imagery. These products provide an estimate of the percentage of green vegetation, non-green vegetation and bare soil for each 30 m …


Investigating Bloom's Cognitive Skills In Foundation And Advanced Programming Courses From Students' Discussions, Joel Jer Wei Lim, Gottipati Swapna, Kyong Jin Shim Nov 2022

Investigating Bloom's Cognitive Skills In Foundation And Advanced Programming Courses From Students' Discussions, Joel Jer Wei Lim, Gottipati Swapna, Kyong Jin Shim

Research Collection School Of Computing and Information Systems

Programming courses provide students with the skills to develop complex business applications. Teaching and learning programming is challenging, and collaborative learning is proposed to help with this challenge. Online discussion forums promote networking with other learners such that they can build knowledge collaboratively. It aids students open their horizons of thought processes to acquire cognitive skills. Cognitive analysis of discussion is critical to understand students' learning process. In this paper, we propose Bloom's taxonomy based cognitive model for programming discussion forums. We present machine learning (ML) based solution to extract students' cognitive skills. Our evaluations on compupting courses show that …


Vlstereoset: A Study Of Stereotypical Bias In Pre-Trained Vision-Language Models, Kankan Zhou, Yibin Lai, Jing Jiang Nov 2022

Vlstereoset: A Study Of Stereotypical Bias In Pre-Trained Vision-Language Models, Kankan Zhou, Yibin Lai, Jing Jiang

Research Collection School Of Computing and Information Systems

In this paper we study how to measure stereotypical bias in pre-trained vision-language models. We leverage a recently released text-only dataset, StereoSet, which covers a wide range of stereotypical bias, and extend it into a vision-language probing dataset called VLStereoSet to measure stereotypical bias in vision-language models. We analyze the differences between text and image and propose a probing task that detects bias by evaluating a model’s tendency to pick stereotypical statements as captions for anti-stereotypical images. We further define several metrics to measure both a vision-language model’s overall stereotypical bias and its intra-modal and inter-modal bias. Experiments on six …


Codematcher: A Tool For Large-Scale Code Search Based On Query Semantics Matching, Chao Liu, Xuanlin Bao, Xin Xia, Meng Yan, David Lo, Ting Zhang Nov 2022

Codematcher: A Tool For Large-Scale Code Search Based On Query Semantics Matching, Chao Liu, Xuanlin Bao, Xin Xia, Meng Yan, David Lo, Ting Zhang

Research Collection School Of Computing and Information Systems

Due to the emergence of large-scale codebases, such as GitHub and Gitee, searching and reusing existing code can help developers substantially improve software development productivity. Over the years, many code search tools have been developed. Early tools leveraged the information retrieval (IR) technique to perform an efficient code search for a frequently changed large-scale codebase. However, the search accuracy was low due to the semantic mismatch between query and code. In the recent years, many tools leveraged Deep Learning (DL) technique to address this issue. But the DL-based tools are slow and the search accuracy is unstable.In this paper, we …


Robustness And Cross-Lingual Transfer: An Exploration Of Out-Of-Distribution Scenario In Natural Language Processing, Yu, Sicheng Sep 2022

Robustness And Cross-Lingual Transfer: An Exploration Of Out-Of-Distribution Scenario In Natural Language Processing, Yu, Sicheng

Dissertations and Theses Collection (Open Access)

Most traditional machine learning or deep learning methods are based on the premise that training data and test data are independent and identical distributed, i.e., IID. However, it is just an ideal situation. In real-world applications, test set and training data often follow different distributions, which we refer to as the out of distribution, i.e., OOD, setting. As a result, models trained with traditional methods always suffer from an undesirable performance drop on the OOD test set. It's necessary to develop techniques to solve this problem for real applications. In this dissertation, we present four pieces of work in the …


Early Rumor Detection Using Neural Hawkes Process With A New Benchmark Dataset, Fengzhu Zeng, Wei Gao Jul 2022

Early Rumor Detection Using Neural Hawkes Process With A New Benchmark Dataset, Fengzhu Zeng, Wei Gao

Research Collection School Of Computing and Information Systems

Little attention has been paid on EArly Rumor Detection (EARD), and EARD performance was evaluated inappropriately on a few datasets where the actual early-stage information is largely missing. To reverse such situation, we construct BEARD, a new Benchmark dataset for EARD, based on claims from fact-checking websites by trying to gather as many early relevant posts as possible. We also propose HEARD, a novel model based on neural Hawkes process for EARD, which can guide a generic rumor detection model to make timely, accurate and stable predictions. Experiments show that HEARD achieves effective EARD performance on two commonly used general …


A Weakly Supervised Propagation Model For Rumor Verification And Stance Detection With Multiple Instance Learning, Ruichao Yang, Jing Ma, Hongzhan Lin, Wei Gao Jul 2022

A Weakly Supervised Propagation Model For Rumor Verification And Stance Detection With Multiple Instance Learning, Ruichao Yang, Jing Ma, Hongzhan Lin, Wei Gao

Research Collection School Of Computing and Information Systems

The diffusion of rumors on social media generally follows a propagation tree structure, which provides valuable clues on how an original message is transmitted and responded by users over time. Recent studies reveal that rumor verification and stance detection are two relevant tasks that can jointly enhance each other despite their differences. For example, rumors can be debunked by cross-checking the stances conveyed by their relevant posts, and stances are also conditioned on the nature of the rumor. However, stance detection typically requires a large training set of labeled stances at post level, which are rare and costly to annotate. …


Blocklens: Visual Analytics Of Student Coding Behaviors In Block-Based Programming Environments., Sean Tung, Huan Wei, Haotian Li, Yong Wang, Meng Xia, Huamin. Qu Jun 2022

Blocklens: Visual Analytics Of Student Coding Behaviors In Block-Based Programming Environments., Sean Tung, Huan Wei, Haotian Li, Yong Wang, Meng Xia, Huamin. Qu

Research Collection School Of Computing and Information Systems

Block-based programming environments have been widely used to introduce K-12 students to coding. To guide students effectively, instructors and platform owners often need to understand behaviors like how students solve certain questions or where they get stuck and why. However, it is challenging for them to effectively analyze students’ coding data. To this end, we propose BlockLens, a novel visual analytics system to assist instructors and platform owners in analyzing students’ block-based coding behaviors, mistakes, and problem-solving patterns. BlockLens enables the grouping of students by question progress and performance, identification of common problem-solving strategies and pitfalls, and presentation of insights …


Exploring And Adapting Chinese Gpt To Pinyin Input Method, Minghuan Tan, Yong Dai, Duyu Tang, Zhangyin Feng, Guoping Huang, Jing Jiang, Jiwei Li, Shuming Shi May 2022

Exploring And Adapting Chinese Gpt To Pinyin Input Method, Minghuan Tan, Yong Dai, Duyu Tang, Zhangyin Feng, Guoping Huang, Jing Jiang, Jiwei Li, Shuming Shi

Research Collection School Of Computing and Information Systems

While GPT has become the de-facto method for text generation tasks, its application to pinyin input method remains unexplored. In this work, we make the first exploration to leverage Chinese GPT for pinyin input method. We find that a frozen GPT achieves state-of-the-art performance on perfect pinyin. However, the performance drops dramatically when the input includes abbreviated pinyin. A reason is that an abbreviated pinyin can be mapped to many perfect pinyin, which links to even larger number of Chinese characters. We mitigate this issue with two strategies, including enriching the context with pinyin and optimizing the training process to …


Translate-Train Embracing Translationese Artifacts, Sicheng Yu, Qianru Sun, Hao Zhang, Jing Jiang May 2022

Translate-Train Embracing Translationese Artifacts, Sicheng Yu, Qianru Sun, Hao Zhang, Jing Jiang

Research Collection School Of Computing and Information Systems

Translate-train is a general training approach to multilingual tasks. The key idea is to use the translator of the target language to generate training data to mitigate the gap between the source and target languages. However, its performance is often hampered by the artifacts in the translated texts (translationese). We discover that such artifacts have common patterns in different languages and can be modeled by deep learning, and subsequently propose an approach to conduct translate-train using Translationese Embracing the effect of Artifacts (TEA). TEA learns to mitigate such effect on the training data of a source language (whose original and …


Using A Bert-Based Ensemble Network For Abusive Language Detection, Noah Ballinger May 2022

Using A Bert-Based Ensemble Network For Abusive Language Detection, Noah Ballinger

Computer Science and Computer Engineering Undergraduate Honors Theses

Over the past two decades, online discussion has skyrocketed in scope and scale. However, so has the amount of toxicity and offensive posts on social media and other discussion sites. Despite this rise in prevalence, the ability to automatically moderate online discussion platforms has seen minimal development. Recently, though, as the capabilities of artificial intelligence (AI) continue to improve, the potential of AI-based detection of harmful internet content has become a real possibility. In the past couple years, there has been a surge in performance on tasks in the field of natural language processing, mainly due to the development of …


On The Influence Of Biases In Bug Localization: Evaluation And Benchmark, Ratnadira Widyasari, Stefanus Agus Haryono, Ferdian Thung, Jieke Shi, Constance Tan, Fiona Wee, Jack Phan, David Lo Mar 2022

On The Influence Of Biases In Bug Localization: Evaluation And Benchmark, Ratnadira Widyasari, Stefanus Agus Haryono, Ferdian Thung, Jieke Shi, Constance Tan, Fiona Wee, Jack Phan, David Lo

Research Collection School Of Computing and Information Systems

Bug localization is the task of identifying parts of thesource code that needs to be changed to resolve a bug report.As this task is difficult, automatic bug localization tools havebeen proposed. The development and evaluation of these toolsrely on the availability of high-quality bug report datasets. In2014, Kochhar et al. identified three biases in datasets used toevaluate bug localization techniques: (1) misclassified bug report,(2) already localized bug report, and (3) incorrect ground truthfile in a bug report. They reported that already localized bugreports statistically significantly and substantially impact buglocalization results, and thus should be removed. However, theirevaluation is still limited, …


The Effect Of Using The Gamification Strategy On Academic Achievement And Motivation Towards Learning Problem-Solving Skills In Computer And Information Technology Course Among Tenth Grade Female Students, Mazyunah Almutairi, Prof. Ahmad Almassaad Feb 2022

The Effect Of Using The Gamification Strategy On Academic Achievement And Motivation Towards Learning Problem-Solving Skills In Computer And Information Technology Course Among Tenth Grade Female Students, Mazyunah Almutairi, Prof. Ahmad Almassaad

International Journal for Research in Education

Abstract

This study aimed to identify the effect of using the gamification strategy on academic achievement and motivation towards learning problem-solving skills in computer and information technology course. A quasi-experimental method was adopted. The study population included tenth-grade female students in Al-Badi’ah schools in Riyadh. The sample consisted of 54 students divided into two equal groups: control group and experimental group. The study tools comprised an achievement test and the motivation scale. The results showed that there were statistically significant differences between the two groups in the academic achievement test in favor of the experimental group, with a large effect …


Codematcher: Searching Code Based On Sequential Semantics Of Important Query Words, Chao Liu, Xin Xia, David Lo, Zhiwei Liu, Ahmed E. Hassan, Shanping Li Jan 2022

Codematcher: Searching Code Based On Sequential Semantics Of Important Query Words, Chao Liu, Xin Xia, David Lo, Zhiwei Liu, Ahmed E. Hassan, Shanping Li

Research Collection School Of Computing and Information Systems

To accelerate software development, developers frequently search and reuse existing code snippets from a large-scale codebase, e.g., GitHub. Over the years, researchers proposed many information retrieval (IR)-based models for code search, but they fail to connect the semantic gap between query and code. An early successful deep learning (DL)-based model DeepCS solved this issue by learning the relationship between pairs of code methods and corresponding natural language descriptions. Two major advantages of DeepCS are the capability of understanding irrelevant/noisy keywords and capturing sequential relationships between words in query and code. In this article, we proposed an IR-based model CodeMatcher that …


Just-In-Time Defect Prediction On Javascript Projects: A Replication Study, Chao Ni, Xin Xia, David Lo, Xiaohu Yang, Ahmed E. Hassan Jan 2022

Just-In-Time Defect Prediction On Javascript Projects: A Replication Study, Chao Ni, Xin Xia, David Lo, Xiaohu Yang, Ahmed E. Hassan

Research Collection School Of Computing and Information Systems

Change-level defect prediction is widely referred to as just-in-time (JIT) defect prediction since it identifies a defect-inducing change at the check-in time, and researchers have proposed many approaches based on the language-independent change-level features. These approaches can be divided into two types: supervised approaches and unsupervised approaches, and their effectiveness has been verified on Java or C++ projects. However, whether the language-independent change-level features can effectively identify the defects of JavaScript projects is still unknown. Additionally, many researches have confirmed that supervised approaches outperform unsupervised approaches on Java or C++ projects when considering inspection effort. However, whether supervised JIT defect …