Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 2 of 2
Full-Text Articles in Theory and Algorithms
An Efficient Transformer-Based Model For Vietnamese Punctuation Prediction, Hieu Tran, Cuong V. Dinh, Hong Quang Pham, Binh T. Nguyen
An Efficient Transformer-Based Model For Vietnamese Punctuation Prediction, Hieu Tran, Cuong V. Dinh, Hong Quang Pham, Binh T. Nguyen
Research Collection School Of Computing and Information Systems
In both formal and informal texts, missing punctuation marks make the texts confusing and challenging to read. This paper aims to conduct exhaustive experiments to investigate the benefits of the pre-trained Transformer-based models on two Vietnamese punctuation datasets. The experimental results show our models can achieve encouraging results, and adding Bi-LSTM or/and CRF layers on top of the proposed models can also boost model performance. Finally, our best model can significantly bypass state-of-the-art approaches on both the novel and news datasets for the Vietnamese language. It can gain the corresponding performance up to 21.45%21.45% and 18.27%18.27% in the overall F1-scores.
Large-Scale Online Feature Selection For Ultra-High Dimensional Sparse Data, Yue Wu, Steven C. H. Hoi, Tao Mei, Nenghai Yu
Large-Scale Online Feature Selection For Ultra-High Dimensional Sparse Data, Yue Wu, Steven C. H. Hoi, Tao Mei, Nenghai Yu
Research Collection School Of Computing and Information Systems
Feature selection (FS) is an important technique in machine learning and data mining, especially for large scale high-dimensional data. Most existing studies have been restricted to batch learning, which is often inefficient and poorly scalable when handling big data in real world. As real data may arrive sequentially and continuously, batch learning has to retrain the model for the new coming data, which is very computationally intensive. Online feature selection (OFS) is a promising new paradigm that is more efficient and scalable than batch learning algorithms. However, existing online algorithms usually fall short in their inferior efficacy. In this article, …