Open Access. Powered by Scholars. Published by Universities.®

Arts and Humanities Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Research Collection School Of Computing and Information Systems

2019

Sequence labeling

Articles 1 - 1 of 1

Full-Text Articles in Arts and Humanities

Punctuation Prediction For Vietnamese Texts Using Conditional Random Fields, Hong Quang Pham, Binh T. Nguyen, Nguyen Viet Cuong Dec 2019

Punctuation Prediction For Vietnamese Texts Using Conditional Random Fields, Hong Quang Pham, Binh T. Nguyen, Nguyen Viet Cuong

Research Collection School Of Computing and Information Systems

We investigate the punctuation prediction for the Vietnamese language. This problem is crucial as it can be used to add suitable punctuation marks to machine-transcribed speeches, which usually do not have such information. Similar to previous works for English and Chinese languages, we formulate this task as a sequence labeling problem. After that, we apply the conditional random field model for solving the problem and propose a set of appropriate features that are useful for prediction. Moreover, we build two corpora from Vietnamese online news and movie subtitles and perform extensive experiments on these data. Finally, we ask four volunteers …