Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 2 of 2
Full-Text Articles in Physical Sciences and Mathematics
Cs04all: Natural Language Processing Project, Hunter R. Johnson
Cs04all: Natural Language Processing Project, Hunter R. Johnson
Open Educational Resources
In this archive there are two activities/assignments suitable for use in a CS0 or Intro course which uses Python.
In the first activity, students are asked to "fill in the code" in a series of short programs that compute a similarity metric (cosine similarity) for text documents. This involves string tokenization, and frequency counting using Python string methods and datatypes.
https://cocalc.com/share/bde99afd-76c8-493d-9608-db9019bcd346/171/Proj1?viewer=share/
In the second activity (taken directly from Think Python 2e) students use a pronunciation dictionary to solve a riddle involving homophones.
https://cocalc.com/share/bde99afd-76c8-493d-9608-db9019bcd346/171/Dicts2?viewer=share/
This OER material was produced as a result of the CS04ALL CUNY OER project
Culture Clubs: Processing Speech By Deriving And Exploiting Linguistic Subcultures, David Guy Brizan
Culture Clubs: Processing Speech By Deriving And Exploiting Linguistic Subcultures, David Guy Brizan
Dissertations, Theses, and Capstone Projects
Spoken language understanding systems are error-prone for several reasons, including individual speech variability. This is manifested in many ways, among which are differences in pronunciation, lexical inventory, grammar and disfluencies. There is, however, a lot of evidence pointing to stable language usage within subgroups of a language population. We call these subgroups linguistic subcultures.
The two broad problems are defined and a survey of the work in this space is performed. The two broad problems are: linguistic subculture detection, commonly performed via Language Identification, Accent Identification or Dialect Identification approaches; and speech and language processing tasks taken which may see …