Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Artificial Intelligence and Robotics

City University of New York (CUNY)

2019

Natural Language Processing

Articles 1 - 2 of 2

Full-Text Articles in Physical Sciences and Mathematics

Cs04all: Natural Language Processing Project, Hunter R. Johnson Feb 2019

Cs04all: Natural Language Processing Project, Hunter R. Johnson

Open Educational Resources

In this archive there are two activities/assignments suitable for use in a CS0 or Intro course which uses Python.

In the first activity, students are asked to "fill in the code" in a series of short programs that compute a similarity metric (cosine similarity) for text documents. This involves string tokenization, and frequency counting using Python string methods and datatypes.

https://cocalc.com/share/bde99afd-76c8-493d-9608-db9019bcd346/171/Proj1?viewer=share/

In the second activity (taken directly from Think Python 2e) students use a pronunciation dictionary to solve a riddle involving homophones.

https://cocalc.com/share/bde99afd-76c8-493d-9608-db9019bcd346/171/Dicts2?viewer=share/

This OER material was produced as a result of the CS04ALL CUNY OER project


Culture Clubs: Processing Speech By Deriving And Exploiting Linguistic Subcultures, David Guy Brizan Feb 2019

Culture Clubs: Processing Speech By Deriving And Exploiting Linguistic Subcultures, David Guy Brizan

Dissertations, Theses, and Capstone Projects

Spoken language understanding systems are error-prone for several reasons, including individual speech variability. This is manifested in many ways, among which are differences in pronunciation, lexical inventory, grammar and disfluencies. There is, however, a lot of evidence pointing to stable language usage within subgroups of a language population. We call these subgroups linguistic subcultures.

The two broad problems are defined and a survey of the work in this space is performed. The two broad problems are: linguistic subculture detection, commonly performed via Language Identification, Accent Identification or Dialect Identification approaches; and speech and language processing tasks taken which may see …