Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 7 of 7

Full-Text Articles in Physical Sciences and Mathematics

A Bert-Based Dual Embedding Model For Chinese Idiom Prediction, Minghuan Tan, Jing Jiang Dec 2020

A Bert-Based Dual Embedding Model For Chinese Idiom Prediction, Minghuan Tan, Jing Jiang

Research Collection School Of Computing and Information Systems

Chinese idioms are special fixed phrases usually derived from ancient stories, whose meanings are oftentimes highly idiomatic and non-compositional. The Chinese idiom prediction task is to select the correct idiom from a set of candidate idioms given a context with a blank. We propose a BERT-based dual embedding model to encode the contextual words as well as to learn dual embeddings of the idioms. Specifically, we first match the embedding of each candidate idiom with the hidden representation corresponding to the blank in the context. We then match the embedding of each candidate idiom with the hidden representations of all …


Cross-Thought For Sentence Encoder Pre-Training, Shuohang Wang, Yuwei Fang, Siqi Sun, Zhe Gan, Yu Cheng, Jingjing Liu, Jing Jiang Nov 2020

Cross-Thought For Sentence Encoder Pre-Training, Shuohang Wang, Yuwei Fang, Siqi Sun, Zhe Gan, Yu Cheng, Jingjing Liu, Jing Jiang

Research Collection School Of Computing and Information Systems

In this paper, we propose Cross-Thought, a novel approach to pre-training sequence encoder, which is instrumental in building reusable sequence embeddings for large-scale NLP tasks such as question answering. Instead of using the original signals of full sentences, we train a Transformer-based sequence encoder over a large set of short sequences, which allows the model to automatically select the most useful information for predicting masked words. Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders trained with continuous sentence signals as well as traditional masked language modeling baselines. Our proposed approach also …


What Was Written Vs. Who Read It: News Media Profiling Using Text Analysis And Social Media Context, Ramy Baly, Georgi Karadzhov, Jisun An, Haewoon Kwak, Yoan Dinkov, Ahmed Ali, James Glass, Preslav. Nakov Jul 2020

What Was Written Vs. Who Read It: News Media Profiling Using Text Analysis And Social Media Context, Ramy Baly, Georgi Karadzhov, Jisun An, Haewoon Kwak, Yoan Dinkov, Ahmed Ali, James Glass, Preslav. Nakov

Research Collection School Of Computing and Information Systems

Predicting the political bias and the factuality of reporting of entire news outlets are critical elements of media profiling, which is an understudied but an increasingly important research direction. The present level of proliferation of fake, biased, and propagandistic content online has made it impossible to fact-check every single suspicious claim, either manually or automatically. Thus, it has been proposed to profile entire news outlets and to look for those that are likely to publish fake or biased content. This makes it possible to detect likely “fake news” the moment they are published, by simply checking the reliability of their …


Automated Synthesis Of Local Time Requirement For Service Composition, Étienne André, Tian Huat Tan, Manman Chen, Shuang Liu, Jun Sun, Yang Liu, Jin Song Dong Jul 2020

Automated Synthesis Of Local Time Requirement For Service Composition, Étienne André, Tian Huat Tan, Manman Chen, Shuang Liu, Jun Sun, Yang Liu, Jin Song Dong

Research Collection School Of Computing and Information Systems

Service composition aims at achieving a business goal by composing existing service-based applications or components. The response time of a service is crucial, especially in time-critical business environments, which is often stated as a clause in service-level agreements between service providers and service users. To meet the guaranteed response time requirement of a composite service, it is important to select a feasible set of component services such that their response time will collectively satisfy the response time requirement of the composite service. In this work, we use the BPEL modeling language that aims at specifying Web services. We extend it …


Securing Bring-Your-Own-Device (Byod) Programming Exams, Oka Kurniawan, Norman Tiong Seng Lee, Christopher M. Poskitt Mar 2020

Securing Bring-Your-Own-Device (Byod) Programming Exams, Oka Kurniawan, Norman Tiong Seng Lee, Christopher M. Poskitt

Research Collection School Of Computing and Information Systems

Traditional pen and paper exams are inadequate for modern university programming courses as they are misaligned with pedagogies and learning objectives that target practical coding ability. Unfortunately, many institutions lack the resources or space to be able to run assessments in dedicated computer labs. This has motivated the development of bring-your-own-device (BYOD) exam formats, allowing students to program in a similar environment to how they learnt, but presenting instructors with significant additional challenges in preventing plagiarism and cheating. In this paper, we describe a BYOD exam solution based on lockdown browsers, software which temporarily turns students' laptops into secure workstations …


Mcdpc: Multi‐Center Density Peak Clustering, Yizhang Wang, Di Wang, Xiaofeng Zhang, Wei Pang, Chunyan Miao, Ah-Hwee Tan, You Zhou Feb 2020

Mcdpc: Multi‐Center Density Peak Clustering, Yizhang Wang, Di Wang, Xiaofeng Zhang, Wei Pang, Chunyan Miao, Ah-Hwee Tan, You Zhou

Research Collection School Of Computing and Information Systems

Density peak clustering (DPC) is a recently developed density-based clustering algorithm that achieves competitive performance in a non-iterative manner. DPC is capable of effectively handling clusters with single density peak (single center), i.e., based on DPC’s hypothesis, one and only one data point is chosen as the center of any cluster. However, DPC may fail to identify clusters with multiple density peaks (multi-centers) and may not be able to identify natural clusters whose centers have relatively lower local density. To address these limitations, we propose a novel clustering algorithm based on a hierarchical approach, named multi-center density peak clustering (McDPC). …


Memory And Resource Leak Defects And Their Repairs In Java Projects, Mohammadreza Ghanavati, Diego Costa, Janos Seboek, David Lo, Artur Andrzejak Jan 2020

Memory And Resource Leak Defects And Their Repairs In Java Projects, Mohammadreza Ghanavati, Diego Costa, Janos Seboek, David Lo, Artur Andrzejak

Research Collection School Of Computing and Information Systems

Despite huge software engineering efforts and programming language support, resource and memory leaks are still a troublesome issue, even in memory-managed languages such as Java. Understanding the properties of leak-inducing defects, how the leaks manifest, and how they are repaired is an essential prerequisite for designing better approaches for avoidance, diagnosis, and repair of leak-related bugs. We conduct a detailed empirical study on 452 issues from 10 large opensource Java projects. The study proposes taxonomies for the leak types, for the defects causing them, and for the repair actions. We investigate, under several aspects, the distributions within each taxonomy and …