Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics

Portland State University

University Honors Theses

Articles 1 - 2 of 2

Full-Text Articles in Physical Sciences and Mathematics

Automatic Keyphrase Extraction From Russian-Language Scholarly Papers In Computational Linguistics, Yves Wienecke Jul 2020

Automatic Keyphrase Extraction From Russian-Language Scholarly Papers In Computational Linguistics, Yves Wienecke

University Honors Theses

The automatic extraction of keyphrases from scholarly papers is a necessary step for many Natural Language Processing (NLP) tasks, including text retrieval, machine translation, and text summarization. However, due to the different grammatical and semantic intricacies of languages, this is a highly language-dependent task. Many free and open source implementations of state-of-the-art keyphrase extraction techniques exist, but they are not adapted for processing Russian text. Furthermore, the multi-linguistic character of scholarly papers in the field of Russian computational linguistics and NLP introduces additional complexity to keyphrase extraction. This paper describes a free and open source program as a proof of …


Empirical Analysis Of Cbow And Skip Gram Nlp Models, Tejas Menon Jul 2020

Empirical Analysis Of Cbow And Skip Gram Nlp Models, Tejas Menon

University Honors Theses

CBOW and Skip Gram are two NLP techniques to produce word embedding models that are accurate and performant. They were invented in the seminal paper by T. Mikolov et al. and have since observed optimizations such as negative sampling and subsampling. This paper implements a fully-optimized version of these models using Py-Torch and runs them through a toy sentiment/subject analysis. It is weakly observed that different corpus types affect the skew of word embeddings such that fictional corpus are better suited for sentiment analysis and non-fictional for subject analysis.