Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

PDF

The University of Akron

Theses/Dissertations

2024

Artificial intelligence

Articles 1 - 1 of 1

Full-Text Articles in Entire DC Network

A Comparison Of Lexical Tokenization Methods, Nathan Culmer Jan 2024

A Comparison Of Lexical Tokenization Methods, Nathan Culmer

Williams Honors College, Honors Research Projects

The purpose of this project was to compare tokenization methods, or methods of breaking up a text into meaningful parts for use in natural language processing. The effectiveness of several commonly used tokenization methods were investigated, including morpheme tokenization, which takes into account the linguistic features of the language. In addition, I proposed and implemented a new technique to consider the capitalization pattern of a word in the tokenization process, in order to allow this process to include more natural language features. The effectiveness of these methods was compared by using them in a sentiment analysis model for various datasets, …