Open Access. Powered by Scholars. Published by Universities.®

Theory and Algorithms Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 2 of 2

Full-Text Articles in Theory and Algorithms

An Efficient Transformer-Based Model For Vietnamese Punctuation Prediction, Hieu Tran, Cuong V. Dinh, Hong Quang Pham, Binh T. Nguyen Jul 2021

An Efficient Transformer-Based Model For Vietnamese Punctuation Prediction, Hieu Tran, Cuong V. Dinh, Hong Quang Pham, Binh T. Nguyen

Research Collection School Of Computing and Information Systems

In both formal and informal texts, missing punctuation marks make the texts confusing and challenging to read. This paper aims to conduct exhaustive experiments to investigate the benefits of the pre-trained Transformer-based models on two Vietnamese punctuation datasets. The experimental results show our models can achieve encouraging results, and adding Bi-LSTM or/and CRF layers on top of the proposed models can also boost model performance. Finally, our best model can significantly bypass state-of-the-art approaches on both the novel and news datasets for the Vietnamese language. It can gain the corresponding performance up to 21.45%21.45% and 18.27%18.27% in the overall F1-scores.


Large-Scale Online Feature Selection For Ultra-High Dimensional Sparse Data, Yue Wu, Steven C. H. Hoi, Tao Mei, Nenghai Yu Aug 2017

Large-Scale Online Feature Selection For Ultra-High Dimensional Sparse Data, Yue Wu, Steven C. H. Hoi, Tao Mei, Nenghai Yu

Research Collection School Of Computing and Information Systems

Feature selection (FS) is an important technique in machine learning and data mining, especially for large scale high-dimensional data. Most existing studies have been restricted to batch learning, which is often inefficient and poorly scalable when handling big data in real world. As real data may arrive sequentially and continuously, batch learning has to retrain the model for the new coming data, which is very computationally intensive. Online feature selection (OFS) is a promising new paradigm that is more efficient and scalable than batch learning algorithms. However, existing online algorithms usually fall short in their inferior efficacy. In this article, …