Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 24 of 24

Full-Text Articles in Physical Sciences and Mathematics

R2f: A General Retrieval, Reading And Fusion Framework For Document-Level Natural Language Inference, Hao Wang, Yixin Cao, Yangguang Li, Zhen Huang, Kun Wang, Jing Shao Dec 2022

R2f: A General Retrieval, Reading And Fusion Framework For Document-Level Natural Language Inference, Hao Wang, Yixin Cao, Yangguang Li, Zhen Huang, Kun Wang, Jing Shao

Research Collection School Of Computing and Information Systems

Document-level natural language inference (DocNLI) is a new challenging task in natural language processing, aiming at judging the entailment relationship between a pair of hypothesis and premise documents. Current datasets and baselines largely follow sentence-level settings, but fail to address the issues raised by longer documents. In this paper, we establish a general solution, named Retrieval, Reading and Fusion (R2F) framework, and a new setting, by analyzing the main challenges of DocNLI: interpretability, long-range dependency, and cross-sentence inference. The basic idea of the framework is to simplify document-level task into a set of sentence-level tasks, and improve both performance and …


A Logistic Regression And Linear Programming Approach For Multi-Skill Staffing Optimization In Call Centers, Thuy Anh Ta, Tien Mai, Fabian Bastin, Pierre L'Ecuyer Dec 2022

A Logistic Regression And Linear Programming Approach For Multi-Skill Staffing Optimization In Call Centers, Thuy Anh Ta, Tien Mai, Fabian Bastin, Pierre L'Ecuyer

Research Collection School Of Computing and Information Systems

We study a staffing optimization problem in multi-skill call centers. The objective is to minimize the total cost of agents under some quality of service (QoS) constraints. The key challenge lies in the fact that the QoS functions have no closed-form and need to be approximated by simulation. In this paper we propose a new way to approximate the QoS functions by logistic functions and design a new algorithm that combines logistic regression, cut generations and logistic-based local search to efficiently find good staffing solutions. We report computational results using examples up to 65 call types and 89 agent groups …


Investigating Bloom's Cognitive Skills In Foundation And Advanced Programming Courses From Students' Discussions, Joel Jer Wei Lim, Gottipati Swapna, Kyong Jin Shim Nov 2022

Investigating Bloom's Cognitive Skills In Foundation And Advanced Programming Courses From Students' Discussions, Joel Jer Wei Lim, Gottipati Swapna, Kyong Jin Shim

Research Collection School Of Computing and Information Systems

Programming courses provide students with the skills to develop complex business applications. Teaching and learning programming is challenging, and collaborative learning is proposed to help with this challenge. Online discussion forums promote networking with other learners such that they can build knowledge collaboratively. It aids students open their horizons of thought processes to acquire cognitive skills. Cognitive analysis of discussion is critical to understand students' learning process. In this paper, we propose Bloom's taxonomy based cognitive model for programming discussion forums. We present machine learning (ML) based solution to extract students' cognitive skills. Our evaluations on compupting courses show that …


Codematcher: A Tool For Large-Scale Code Search Based On Query Semantics Matching, Chao Liu, Xuanlin Bao, Xin Xia, Meng Yan, David Lo, Ting Zhang Nov 2022

Codematcher: A Tool For Large-Scale Code Search Based On Query Semantics Matching, Chao Liu, Xuanlin Bao, Xin Xia, Meng Yan, David Lo, Ting Zhang

Research Collection School Of Computing and Information Systems

Due to the emergence of large-scale codebases, such as GitHub and Gitee, searching and reusing existing code can help developers substantially improve software development productivity. Over the years, many code search tools have been developed. Early tools leveraged the information retrieval (IR) technique to perform an efficient code search for a frequently changed large-scale codebase. However, the search accuracy was low due to the semantic mismatch between query and code. In the recent years, many tools leveraged Deep Learning (DL) technique to address this issue. But the DL-based tools are slow and the search accuracy is unstable.In this paper, we …


Vlstereoset: A Study Of Stereotypical Bias In Pre-Trained Vision-Language Models, Kankan Zhou, Yibin Lai, Jing Jiang Nov 2022

Vlstereoset: A Study Of Stereotypical Bias In Pre-Trained Vision-Language Models, Kankan Zhou, Yibin Lai, Jing Jiang

Research Collection School Of Computing and Information Systems

In this paper we study how to measure stereotypical bias in pre-trained vision-language models. We leverage a recently released text-only dataset, StereoSet, which covers a wide range of stereotypical bias, and extend it into a vision-language probing dataset called VLStereoSet to measure stereotypical bias in vision-language models. We analyze the differences between text and image and propose a probing task that detects bias by evaluating a model’s tendency to pick stereotypical statements as captions for anti-stereotypical images. We further define several metrics to measure both a vision-language model’s overall stereotypical bias and its intra-modal and inter-modal bias. Experiments on six …


Large-Scale Analysis Of Non-Termination Bugs In Real-World Oss Projects, Xiuhan Shi, Xiaofei Xie, Yi Li, Yao Zhang, Sen Chen, Xiaohong Li Nov 2022

Large-Scale Analysis Of Non-Termination Bugs In Real-World Oss Projects, Xiuhan Shi, Xiaofei Xie, Yi Li, Yao Zhang, Sen Chen, Xiaohong Li

Research Collection School Of Computing and Information Systems

Termination is a crucial program property. Non-termination bugs can be subtle to detect and may remain hidden for long before they take effect. Many real-world programs still suffer from vast consequences (e.g., no response) caused by non-termination bugs. As a classic problem, termination proving has been studied for many years. Many termination checking tools and techniques have been developed and demonstrated effectiveness on existing wellestablished benchmarks. However, the capability of these tools in finding practical non-termination bugs has yet to be tested on real-world projects. To fill in this gap, in this paper, we conducted the first large-scale empirical study …


Transrepair: Context-Aware Program Repair For Compilation Errors, Xueyang Li, Shangqing Liu, Ruitao Feng, Guozhu Meng, Xiaofei Xie, Kai Chen, Yang Liu Oct 2022

Transrepair: Context-Aware Program Repair For Compilation Errors, Xueyang Li, Shangqing Liu, Ruitao Feng, Guozhu Meng, Xiaofei Xie, Kai Chen, Yang Liu

Research Collection School Of Computing and Information Systems

Automatically fixing compilation errors can greatly raise the productivity of software development, by guiding the novice or AI programmers to write and debug code. Recently, learning-based program repair has gained extensive attention and became the stateof-the-art in practice. But it still leaves plenty of space for improvement. In this paper, we propose an end-to-end solution TransRepair to locate the error lines and create the correct substitute for a C program simultaneously. Superior to the counterpart, our approach takes into account the context of erroneous code and diagnostic compilation feedback. Then we devise a Transformer-based neural network to learn the ways …


Towards Understanding The Faults Of Javascript-Based Deep Learning Systems, Lili Quan, Qianyu Guo, Xiaofei Xie, Sen Chen, Xiaohong Li, Yang Liu Oct 2022

Towards Understanding The Faults Of Javascript-Based Deep Learning Systems, Lili Quan, Qianyu Guo, Xiaofei Xie, Sen Chen, Xiaohong Li, Yang Liu

Research Collection School Of Computing and Information Systems

Quality assurance is of great importance for deep learning (DL) systems, especially when they are applied in safety-critical applications. While quality issues of native DL applications have been extensively analyzed, the issues of JavaScript-based DL applications have never been systematically studied. Compared with native DL applications, JavaScript-based DL applications can run on major browsers, making the platform- and device-independent. Specifically, the quality of JavaScript-based DL applications depends on the 3 parts: the application, the third-party DL library used and the underlying DL framework (e.g., TensorFlow.js), called JavaScript-based DL system. In this paper, we conduct the first empirical study on the …


Early Rumor Detection Using Neural Hawkes Process With A New Benchmark Dataset, Fengzhu Zeng, Wei Gao Jul 2022

Early Rumor Detection Using Neural Hawkes Process With A New Benchmark Dataset, Fengzhu Zeng, Wei Gao

Research Collection School Of Computing and Information Systems

Little attention has been paid on EArly Rumor Detection (EARD), and EARD performance was evaluated inappropriately on a few datasets where the actual early-stage information is largely missing. To reverse such situation, we construct BEARD, a new Benchmark dataset for EARD, based on claims from fact-checking websites by trying to gather as many early relevant posts as possible. We also propose HEARD, a novel model based on neural Hawkes process for EARD, which can guide a generic rumor detection model to make timely, accurate and stable predictions. Experiments show that HEARD achieves effective EARD performance on two commonly used general …


Designing Flipped Learning Activities For Beginner Programming Course, Benjamin Gan, Eng Lieh Ouh Jul 2022

Designing Flipped Learning Activities For Beginner Programming Course, Benjamin Gan, Eng Lieh Ouh

Research Collection School Of Computing and Information Systems

This study focuses on designing flipped classroom learning activities across pre-class problem-based exercises; with in-class active discussions and practical problem-solving sessions; and follow up with postclass problem-based labs and assessments. We evaluate the effectiveness of our learning activities based on student surveys, course feedback, grades, and teacher feedback for a beginner programming course with non-IS students. We describe detail programming learning activities with comparisons to existing practices based on related work. Our findings are that majority of students (86%) agreed with flipped classroom, but teachers should be aware of the 14% who disagreed and cater for them. Teachers should avoid …


A Weakly Supervised Propagation Model For Rumor Verification And Stance Detection With Multiple Instance Learning, Ruichao Yang, Jing Ma, Hongzhan Lin, Wei Gao Jul 2022

A Weakly Supervised Propagation Model For Rumor Verification And Stance Detection With Multiple Instance Learning, Ruichao Yang, Jing Ma, Hongzhan Lin, Wei Gao

Research Collection School Of Computing and Information Systems

The diffusion of rumors on social media generally follows a propagation tree structure, which provides valuable clues on how an original message is transmitted and responded by users over time. Recent studies reveal that rumor verification and stance detection are two relevant tasks that can jointly enhance each other despite their differences. For example, rumors can be debunked by cross-checking the stances conveyed by their relevant posts, and stances are also conditioned on the nature of the rumor. However, stance detection typically requires a large training set of labeled stances at post level, which are rare and costly to annotate. …


Blocklens: Visual Analytics Of Student Coding Behaviors In Block-Based Programming Environments., Sean Tung, Huan Wei, Haotian Li, Yong Wang, Meng Xia, Huamin. Qu Jun 2022

Blocklens: Visual Analytics Of Student Coding Behaviors In Block-Based Programming Environments., Sean Tung, Huan Wei, Haotian Li, Yong Wang, Meng Xia, Huamin. Qu

Research Collection School Of Computing and Information Systems

Block-based programming environments have been widely used to introduce K-12 students to coding. To guide students effectively, instructors and platform owners often need to understand behaviors like how students solve certain questions or where they get stuck and why. However, it is challenging for them to effectively analyze students’ coding data. To this end, we propose BlockLens, a novel visual analytics system to assist instructors and platform owners in analyzing students’ block-based coding behaviors, mistakes, and problem-solving patterns. BlockLens enables the grouping of students by question progress and performance, identification of common problem-solving strategies and pitfalls, and presentation of insights …


Itss: Interactive Web-Based Authoring And Playback Integrated Environment For Programming Tutorials, Eng Lieh Ouh, Benjamin Gan, David Lo May 2022

Itss: Interactive Web-Based Authoring And Playback Integrated Environment For Programming Tutorials, Eng Lieh Ouh, Benjamin Gan, David Lo

Research Collection School Of Computing and Information Systems

Video-based programming tutorials are a popular form of tutorial used by authors to guide learners to code. Still, the interactivity of these videos is limited primarily to control video flow. There are existing works with increased interactivity that are shown to improve the learning experience. Still, these solutions require setting up a custom recording environment and are not well-integrated with the playback environment. This paper describes our integrated ITSS environment and evaluates the ease of authoring and playback of our interactive programming tutorials. Our environment is designed to run within the browser sandbox and is less intrusive to record interactivity …


Graphcode2vec: Generic Code Embedding Via Lexical And Program Dependence Analyses, Wei Ma, Mengjie Zhao, Ezekiel Soremekun, Qiang Hu, Jie M. Zhang, Mike Papadakis, Maxime Cordy, Xiaofei Xie, Yves Le Traon May 2022

Graphcode2vec: Generic Code Embedding Via Lexical And Program Dependence Analyses, Wei Ma, Mengjie Zhao, Ezekiel Soremekun, Qiang Hu, Jie M. Zhang, Mike Papadakis, Maxime Cordy, Xiaofei Xie, Yves Le Traon

Research Collection School Of Computing and Information Systems

Code embedding is a keystone in the application of machine learning on several Software Engineering (SE) tasks. To effectively support a plethora of SE tasks, the embedding needs to capture program syntax and semantics in a way that is generic. To this end, we propose the first self-supervised pre-training approach (called Graphcode2vec) which produces task-agnostic embedding of lexical and program dependence features. Graphcode2vec achieves this via a synergistic combination of code analysis and Graph Neural Networks. Graphcode2vec is generic, it allows pre-training, and it is applicable to several SE downstream tasks. We evaluate the effectiveness of Graphcode2vec on four (4) …


Translate-Train Embracing Translationese Artifacts, Sicheng Yu, Qianru Sun, Hao Zhang, Jing Jiang May 2022

Translate-Train Embracing Translationese Artifacts, Sicheng Yu, Qianru Sun, Hao Zhang, Jing Jiang

Research Collection School Of Computing and Information Systems

Translate-train is a general training approach to multilingual tasks. The key idea is to use the translator of the target language to generate training data to mitigate the gap between the source and target languages. However, its performance is often hampered by the artifacts in the translated texts (translationese). We discover that such artifacts have common patterns in different languages and can be modeled by deep learning, and subsequently propose an approach to conduct translate-train using Translationese Embracing the effect of Artifacts (TEA). TEA learns to mitigate such effect on the training data of a source language (whose original and …


Exploring And Adapting Chinese Gpt To Pinyin Input Method, Minghuan Tan, Yong Dai, Duyu Tang, Zhangyin Feng, Guoping Huang, Jing Jiang, Jiwei Li, Shuming Shi May 2022

Exploring And Adapting Chinese Gpt To Pinyin Input Method, Minghuan Tan, Yong Dai, Duyu Tang, Zhangyin Feng, Guoping Huang, Jing Jiang, Jiwei Li, Shuming Shi

Research Collection School Of Computing and Information Systems

While GPT has become the de-facto method for text generation tasks, its application to pinyin input method remains unexplored. In this work, we make the first exploration to leverage Chinese GPT for pinyin input method. We find that a frozen GPT achieves state-of-the-art performance on perfect pinyin. However, the performance drops dramatically when the input includes abbreviated pinyin. A reason is that an abbreviated pinyin can be mapped to many perfect pinyin, which links to even larger number of Chinese characters. We mitigate this issue with two strategies, including enriching the context with pinyin and optimizing the training process to …


Jscsp: A Novel Policy-Based Xss Defense Mechanism For Browsers, Guangquan Xu, Xiaofei Xie, Shuhan Huang, Jun Zhang, Lei Pan, Wei Lou, Kaitai Liang Mar 2022

Jscsp: A Novel Policy-Based Xss Defense Mechanism For Browsers, Guangquan Xu, Xiaofei Xie, Shuhan Huang, Jun Zhang, Lei Pan, Wei Lou, Kaitai Liang

Research Collection School Of Computing and Information Systems

To mitigate cross-site scripting attacks (XSS), the W3C group recommends web service providers to employ a computer security standard called Content Security Policy (CSP). However, less than 3.7 percent of real-world websites are equipped with CSP according to Google’s survey. The low scalability of CSP is incurred by the difficulty of deployment and non-compatibility for state-of-art browsers. To explore the scalability of CSP, in this article, we propose JavaScript based CSP (JSCSP), which is able to support most of real-world browsers but also to generate security policies automatically. Specifically, JSCSP offers a novel self-defined security policy which enforces essential confinements …


On The Influence Of Biases In Bug Localization: Evaluation And Benchmark, Ratnadira Widyasari, Stefanus Agus Haryono, Ferdian Thung, Jieke Shi, Constance Tan, Fiona Wee, Jack Phan, David Lo Mar 2022

On The Influence Of Biases In Bug Localization: Evaluation And Benchmark, Ratnadira Widyasari, Stefanus Agus Haryono, Ferdian Thung, Jieke Shi, Constance Tan, Fiona Wee, Jack Phan, David Lo

Research Collection School Of Computing and Information Systems

Bug localization is the task of identifying parts of thesource code that needs to be changed to resolve a bug report.As this task is difficult, automatic bug localization tools havebeen proposed. The development and evaluation of these toolsrely on the availability of high-quality bug report datasets. In2014, Kochhar et al. identified three biases in datasets used toevaluate bug localization techniques: (1) misclassified bug report,(2) already localized bug report, and (3) incorrect ground truthfile in a bug report. They reported that already localized bugreports statistically significantly and substantially impact buglocalization results, and thus should be removed. However, theirevaluation is still limited, …


Learning Program Semantics With Code Representations: An Empirical Study, Jing Kai Siow, Shangqing Liu, Xiaofei Xie, Guozhu Meng, Yang Liu Mar 2022

Learning Program Semantics With Code Representations: An Empirical Study, Jing Kai Siow, Shangqing Liu, Xiaofei Xie, Guozhu Meng, Yang Liu

Research Collection School Of Computing and Information Systems

Program semantics learning is the core and fundamental for various code intelligent tasks e.g., vulnerability detection, clone detection. A considerable amount of existing works propose diverse approaches to learn the program semantics for different tasks and these works have achieved state-of-the-art performance. However, currently, a comprehensive and systematic study on evaluating different program representation techniques across diverse tasks is still missed. From this starting point, in this paper, we conduct an empirical study to evaluate different program representation techniques. Specifically, we categorize current mainstream code representation techniques into four categories i.e., Feature-based, Sequence-based, Tree-based, and Graph-based program representation technique and …


Broken External Links On Stack Overflow, Jiakun Liu, Xin Xia, David Lo, Haoxiang Zhang, Ying Zou, Ahmed E. Hassan, Shanping Li Feb 2022

Broken External Links On Stack Overflow, Jiakun Liu, Xin Xia, David Lo, Haoxiang Zhang, Ying Zou, Ahmed E. Hassan, Shanping Li

Research Collection School Of Computing and Information Systems

Stack Overflow hosts valuable programming-related knowledge with 11,926,354 links that reference to the third-party websites. The links that reference to the resources hosted outside the Stack Overflow websites extend the Stack Overflow knowledge base substantially. However, with the rapid development of programming-related knowledge, many resources hosted on the Internet are not available anymore. Based on our analysis of the Stack Overflow data that was released on Jun. 2, 2019, 14.2 percent of the links on Stack Overflow are broken links. The broken links on Stack Overflow can obstruct viewers from obtaining desired programming-related knowledge, and potentially damage the reputation of …


A Quantum Interpretation Of Separating Conjunction For Local Reasoning Of Quantum Programs Based On Separation Logic, Xuan Bach Le, Shang-Wei Lin, Jun Sun, David Sanan Jan 2022

A Quantum Interpretation Of Separating Conjunction For Local Reasoning Of Quantum Programs Based On Separation Logic, Xuan Bach Le, Shang-Wei Lin, Jun Sun, David Sanan

Research Collection School Of Computing and Information Systems

It is well-known that quantum programs are not only complicated to write but also tedious to verify due to their enormous state-space and the sophisticated mathematics beneath. In this work, we propose a Hoare-style inference framework to verify quantum programs. We infuse separation logic in our framework and invent the $\hoarule{qframe}$ rule, the quantum counterpart of the frame rule, to support local reasoning of quantum programs. The design of our framework is planned with a mindset for intuition and human-readability, using vectors in Dirac notation for reasoning instead of the orthodox matrix representation as in existing approaches. For evaluation, we …


Codematcher: Searching Code Based On Sequential Semantics Of Important Query Words, Chao Liu, Xin Xia, David Lo, Zhiwei Liu, Ahmed E. Hassan, Shanping Li Jan 2022

Codematcher: Searching Code Based On Sequential Semantics Of Important Query Words, Chao Liu, Xin Xia, David Lo, Zhiwei Liu, Ahmed E. Hassan, Shanping Li

Research Collection School Of Computing and Information Systems

To accelerate software development, developers frequently search and reuse existing code snippets from a large-scale codebase, e.g., GitHub. Over the years, researchers proposed many information retrieval (IR)-based models for code search, but they fail to connect the semantic gap between query and code. An early successful deep learning (DL)-based model DeepCS solved this issue by learning the relationship between pairs of code methods and corresponding natural language descriptions. Two major advantages of DeepCS are the capability of understanding irrelevant/noisy keywords and capturing sequential relationships between words in query and code. In this article, we proposed an IR-based model CodeMatcher that …


Steps Before Syntax: Helping Novice Programmers Solve Problems Using The Pcdit Framework, Oka Kurniawan, Cyrille Jegourel, Norman Tiong Seng Lee, Matthieu De Mari, Christopher M. Poskitt Jan 2022

Steps Before Syntax: Helping Novice Programmers Solve Problems Using The Pcdit Framework, Oka Kurniawan, Cyrille Jegourel, Norman Tiong Seng Lee, Matthieu De Mari, Christopher M. Poskitt

Research Collection School Of Computing and Information Systems

Novice programmers often struggle with problem solving due to the high cognitive loads they face. Furthermore, many introductory programming courses do not explicitly teach it, assuming that problem solving skills are acquired along the way. In this paper, we present 'PCDIT', a non-linear problem solving framework that provides scaffolding to guide novice programmers through the process of transforming a problem specification into an implemented and tested solution for an imperative programming language. A key distinction of PCDIT is its focus on developing concrete cases for the problem early without actually writing test code: students are instead encouraged to think about …


Just-In-Time Defect Prediction On Javascript Projects: A Replication Study, Chao Ni, Xin Xia, David Lo, Xiaohu Yang, Ahmed E. Hassan Jan 2022

Just-In-Time Defect Prediction On Javascript Projects: A Replication Study, Chao Ni, Xin Xia, David Lo, Xiaohu Yang, Ahmed E. Hassan

Research Collection School Of Computing and Information Systems

Change-level defect prediction is widely referred to as just-in-time (JIT) defect prediction since it identifies a defect-inducing change at the check-in time, and researchers have proposed many approaches based on the language-independent change-level features. These approaches can be divided into two types: supervised approaches and unsupervised approaches, and their effectiveness has been verified on Java or C++ projects. However, whether the language-independent change-level features can effectively identify the defects of JavaScript projects is still unknown. Additionally, many researches have confirmed that supervised approaches outperform unsupervised approaches on Java or C++ projects when considering inspection effort. However, whether supervised JIT defect …