Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

155 Full-Text Articles 242 Authors 19,786 Downloads 38 Institutions

All Articles in Computational Linguistics

Faceted Search

155 full-text articles. Page 1 of 7.

Multimodal Depression Detection: An Investigation Of Features And Fusion Techniques For Automated Systems, Michelle Renee Morales 2018 The Graduate Center, City University of New York

Multimodal Depression Detection: An Investigation Of Features And Fusion Techniques For Automated Systems, Michelle Renee Morales

All Dissertations, Theses, and Capstone Projects

Depression is a serious illness that affects a large portion of the world’s population. Given the large effect it has on society, it is evident that depression is a serious health issue. This thesis evaluates, at length, how technology may aid in assessing depression. We present an in-depth investigation of features and fusion techniques for depression detection systems. We also present OpenMM: a novel tool for multimodal feature extraction. Lastly, we present novel techniques for multimodal fusion. The contributions of this work add considerably to our knowledge of depression detection systems and have the potential to improve future systems ...


Speech Perception In “Bubble” Noise: Korean Fricatives And Affricates By Native And Non-Native Korean Listeners, Jiyoung Choi 2018 The Graduate Center, City University of New York

Speech Perception In “Bubble” Noise: Korean Fricatives And Affricates By Native And Non-Native Korean Listeners, Jiyoung Choi

All Dissertations, Theses, and Capstone Projects

The current study examines acoustic cues used by second language learners of Korean to discriminate between Korean fricatives and affricates in noise and how these cues relate to those used by native Korean listeners. Stimuli consist of naturally-spoken consonant-vowel-consonant-vowel (CVCV) syllables: /sɑdɑ/, /s*ɑdɑ/, /tʃɑdɑ/, /tʃhɑdɑ/, and /tʃ*ɑdɑ/. In this experiment, the “bubble noise” methodology of Mandel at al. (2016) was used to identify the time-frequency locations of important cues in each utterance, i.e., where audibility of the location is significantly correlated with correct identification of the utterance in noise. Results show that non-native Korean listeners ...


Intergroup Variability In Personality Recognition, Arundhati Sengupta 2018 The Graduate Center, City University of New York

Intergroup Variability In Personality Recognition, Arundhati Sengupta

All Dissertations, Theses, and Capstone Projects

Automatic Identification of personality in conversational speech has many applications in natural language processing such as leader identification in a meeting, adaptive dialogue systems, and dating websites. However, the widespread acceptance of automatic personality recognition through lexical and vocal characteristics is limited by the variability of error rate in a general purpose model among speakers from different demographic groups. While other work reports accuracy, we explored error rates of automatic personality recognition task using classification models for different genders and native language groups (L1). We also present a statistical experiment showing the influence of gender and L1 on the relation ...


Describing Doggo-Speak: Features Of Doggo Meme Language, Jennifer Bivens 2018 The Graduate Center, City University of New York

Describing Doggo-Speak: Features Of Doggo Meme Language, Jennifer Bivens

All Dissertations, Theses, and Capstone Projects

Doggo-speak is a specialized way of writing most commonly associated with captions on Doggo memes, humorous images of dogs shared in online communities. This paper will explore linguistic features of Doggo-speak through analysis of social media posts by Doggo fan pages. It will use the discussed features as inputs to five machine learning classifiers and will show, through this classification task, that the discussed features are sufficient for distinguishing between Doggo-speak and more general English text.


Automatic Analysis Of Musical Lyrics, Joanna Gormley 2018 Merrimack College

Automatic Analysis Of Musical Lyrics, Joanna Gormley

Honors Senior Capstone Projects

Is music getting less sophisticated over time? That is the question which this study aims to answer, with the goal of improving upon previous analysis done on the topic. The blog posts which inspired this project lacked accuracy and dimensionality. Realizing that a larger data set of songs would make a significant difference in the precision of our analysis, we set out to design a piece of software constructed with the capability to analyze several thousand songs. Mimicking previous works which analyzed sophistication of music, the software focuses on the lyrics of songs. Three metrics were used in order to ...


Role Of Information Technology In Development Of Eritrean Language - ኣበርክቶ ቴክኖሎጂ ሓበሬታ ኣብ ምምዕባል ቋንቋታት ኤርትራ, Filmon Gebreyesus Ph.D 2018 Santa Clara University

Role Of Information Technology In Development Of Eritrean Language - ኣበርክቶ ቴክኖሎጂ ሓበሬታ ኣብ ምምዕባል ቋንቋታት ኤርትራ, Filmon Gebreyesus Ph.D

Symposium on Eritrean Literature

Information technology has been affecting us in every day of our lives, especially social media has been the main means of communication in our society. But, all the access to this current and ever-growing technology has always been limited to using it in English, Arab or other languages because our language didn’t come up to speed with the current technology.

Though there has been lots of efforts to develop Tigrigna or other languages application programs to help us use our language, there are still lots of gaps that could be filled to achieve the competence of our languages. In ...


Detecting Language Impairments In Autism: A Computational Analysis Of Semi-Structured Conversations With Vector Semantics, Adam Goodkind, Michelle Lee, Gary E. Martin, Molly Losh, Klinton Bicknell 2018 Northwestern University

Detecting Language Impairments In Autism: A Computational Analysis Of Semi-Structured Conversations With Vector Semantics, Adam Goodkind, Michelle Lee, Gary E. Martin, Molly Losh, Klinton Bicknell

Proceedings of the Society for Computation in Linguistics

Many of the most significant impairments faced by individuals with autism spectrum disorder (ASD) relate to pragmatic (i.e. social) language. There is also evidence that pragmatic language differences may map to ASD-related genes. Therefore, quantifying the social-linguistic features of ASD has the potential to both improve clinical treatment and help identify gene-behavior relationships in ASD. Here, we apply vector semantics to transcripts of semi-structured interactions with children with both idiopathic and syndromic ASD. We find that children with ASD are less semantically similar to a gold standard derived from typically developing participants, and are more semantically variable. We show ...


Grammar Size And Quantitative Restrictions On Movement, Thomas Graf 2018 Stony Brook University

Grammar Size And Quantitative Restrictions On Movement, Thomas Graf

Proceedings of the Society for Computation in Linguistics

Recently is has been proved that every Minimalist grammar can be converted into a strongly equivalent single movement normal form such that every phrase moves at most once in every derivation. The normal form conversion greatly simplifies the formalism and reduces the complexity of movement dependencies, but it also runs the risk of greatly increasing the size of the grammar. I show that no such blow-up obtains with linguistically plausible grammars that respect common constraints on movement. This establishes not only the cost-free nature of this normal form for realistic grammars, but also that the known restrictions on movement greatly ...


Modeling The Decline In English Passivization, Liwen Hou, David Smith 2018 Northeastern University

Modeling The Decline In English Passivization, Liwen Hou, David Smith

Proceedings of the Society for Computation in Linguistics

Evidence from the Hansard corpus shows that the passive voice in British English has declined in relative frequency over the last two centuries. We investigate which factors are predictive of whether transitive verb phrases are passivized. We show the increasing importance of the person-hierarchy effects observed by Bresnan et al. (2001), with increasing strength of the constraint against passivizing clauses with local agents, as well as the rising prevalence of such agents. Moreover, our ablation experiments on the Wall Street Journal and Hansard corpora provide support for the unmarked information structure of ‘given’ before ‘new’ noted by Halliday (1967).


A Bidirectional Mapping Between English And Cnf-Based Reasoners, Steven Abney 2018 University of Michigan

A Bidirectional Mapping Between English And Cnf-Based Reasoners, Steven Abney

Proceedings of the Society for Computation in Linguistics

If language is a transduction between sound and meaning, the target of semantic interpretation should be the meaning representation expected by general cognition. Automated reasoners provide the best available fully-explicit proxies for general cognition, and they commonly expect Clause Normal Form (CNF) as input. There is a well-known algorithm for converting from unrestricted predicate calculus to CNF, but it is not invertible, leaving us without a means to transduce CNF back to English. I present a solution, with possible repercussions for the overall framework of semantic interpretation.


Differentiating Phrase Structure Parsing And Memory Retrieval In The Brain, Shohini Bhattasali, John Hale, Christophe Pallier, Jonathan Brennan, Wen-Ming Luh, R. Nathan Spreng 2018 Cornell University

Differentiating Phrase Structure Parsing And Memory Retrieval In The Brain, Shohini Bhattasali, John Hale, Christophe Pallier, Jonathan Brennan, Wen-Ming Luh, R. Nathan Spreng

Proceedings of the Society for Computation in Linguistics

On some level, human sentence comprehension must involve both memory retrieval and structural composition. This study differentiates these two processes using neuroimaging data collected during naturalistic listening. Retrieval is formalized in terms of "multiword expressions" while structure-building is formalized in terms of bottom-up parsing. The results most strongly implicate Anterior Temporal regions for structure-building and Precuneus Cortex for memory retrieval.


Modeling The Complexity And Descriptive Adequacy Of Construction Grammars, Jonathan Dunn 2018 Illinois Institute of Technology

Modeling The Complexity And Descriptive Adequacy Of Construction Grammars, Jonathan Dunn

Proceedings of the Society for Computation in Linguistics

This paper uses the Minimum Description Length paradigm to model the complexity of CxGs (operationalized as the encoding size of a grammar) alongside their descriptive adequacy (operationalized as the encoding size of a corpus given a grammar). These two quantities are combined to measure the quality of potential CxGs against unannotated corpora, supporting discovery-device CxGs for English, Spanish, French, German, and Italian. The results show (i) that these grammars provide significant generalizations as measured using compression and (ii) that more complex CxGs with access to multiple levels of representation provide greater generalizations than single-representation CxGs.


Phonologically Informed Edit Distance Algorithms For Word Alignment With Low-Resource Languages, Richard T. McCoy, Robert Frank 2018 Johns Hopkins University

Phonologically Informed Edit Distance Algorithms For Word Alignment With Low-Resource Languages, Richard T. Mccoy, Robert Frank

Proceedings of the Society for Computation in Linguistics

We present three methods for weighting edit distance algorithms based on linguistic information. These methods base their penalties on (i) phonological features, (ii) distributional character embeddings, or (iii) differences between cognate words. We also introduce a novel method for evaluating edit distance through the task of low-resource word alignment by using edit-distance neighbors in a high-resource pivot language to inform alignments from the low-resource language. At this task, the cognate-based scheme outperforms our other methods and the Levenshtein edit distance baseline, showing that NLP applications can benefit from information about cross-linguistic phonological patterns.


Conditions On Abruptness In A Gradient-Ascent Maximum Entropy Learner, Elliott Moreton 2018 University of North Carolina, Chapel Hill

Conditions On Abruptness In A Gradient-Ascent Maximum Entropy Learner, Elliott Moreton

Proceedings of the Society for Computation in Linguistics

When does a gradual learning rule yield gradual learning performance? This paper studies a gradient-ascent Maximum Entropy phonotactic learner, as applied to two-alternative forced-choice performance expressed as log-odds. The main result is that slow initial performance cannot accelerate later if the initial weights are near zero, but can if they are not. Stated another way, abruptness in this learner is an effect of transfer, either from Universal Grammar in the form of an initial weighting, or from previous learning in the form of an acquired weighting.


Using Rhetorical Topics For Automatic Summarization, Natalie M. Schrimpf 2018 Yale University

Using Rhetorical Topics For Automatic Summarization, Natalie M. Schrimpf

Proceedings of the Society for Computation in Linguistics

Summarization involves finding the most important information in a text in order to convey the meaning of the document. In this paper, I present a method for using topic information to influence which content is selected for a summary. Texts are divided into topics using rhetorical information that creates a partition of a text into a sequence of non-overlapping topics. To investigate the effect of this topic structure, I compare the output of summarizing an entire text without topics to summarizing individual topics and combining them into a complete summary. The results show that the use of these rhetorical topics ...


Imdlawn Tashlhiyt Berber Syllabification Is Quantifier-Free, Kristina Strother-Garcia 2018 University of Delaware

Imdlawn Tashlhiyt Berber Syllabification Is Quantifier-Free, Kristina Strother-Garcia

Proceedings of the Society for Computation in Linguistics

Imdlawn Tashlhiyt Berber (ITB) is unusual due to its tolerance of non-vocalic syllabic nuclei. Rule-based and constraint-based accounts of ITB syllabification do not directly address the question of how complex the process is. Model theory and formal logic allow for comparison of complexity across different theories of phonology by identifying the computational power (or expressivity) of linguistic formalisms in a grammar-independent way. With these tools, I develop a mathematical formalism for representing ITB syllabification using Quantifier-Free (QF) logic, one of the least powerful logics known. This result indicates that ITB syllabification is relatively simple from a computational standpoint and that ...


Towards A Formal Description Of Npi-Licensing Patterns, Mai Ha Vu 2018 University of Delaware

Towards A Formal Description Of Npi-Licensing Patterns, Mai Ha Vu

Proceedings of the Society for Computation in Linguistics

This paper is a formal study of a simplified version of Negative Polarity Item (NPI) licensing requirements in two languages, English and Hungarian. In the framework of Model-Theoretic Syntax, using logical formalisms defined over tree-languages, I show that neither pattern can be described with Tier-based Strictly Local (TSL) constraints only, and suggest that they need a more complex logical formula. In particular, Hungarian patterns can be described using a combination of Tier-based Strictly 2-Local constraints over dominance relations and Locally 1-Testable constraints over the left-of relations between nodes. For English, there are no sufficient local constraints, either with or without ...


The Organization Of Lexicons: A Cross-Linguistic Analysis Of Monosyllabic Words, Shiying Yang, Chelsea Sanker, Uriel Cohen Priva 2018 Brown University

The Organization Of Lexicons: A Cross-Linguistic Analysis Of Monosyllabic Words, Shiying Yang, Chelsea Sanker, Uriel Cohen Priva

Proceedings of the Society for Computation in Linguistics

Lexicons utilize a fraction of licit structures. Different theories predict either that lexicons prioritize contrastiveness or structural economy. Study 1 finds that the monosyllabic lexicon of Mandarin is no more distinctive than a randomly sampled baseline using the phonological inventory. Study 2 finds that the lexicons of Mandarin and American English have fewer phonotactically complex words than the random baseline: Words tend not to have multiple low-probability components. This suggests that phonological constraints can have superadditive penalties for combined violations, consistent with e.g. Albright (ms.).


Word Learning As Category Formation, Spencer Caplan 2018 University of Pennsylvania

Word Learning As Category Formation, Spencer Caplan

Proceedings of the Society for Computation in Linguistics

A fundamental question in word learning is how, given only evidence about what objects a word has previously referred to, children are able to generalize the total class (Smith, 1979; Xu and Tenenbaum, 2007). E.g. how a child ends up knowing that \textit{`poodle'} only picks out a specific subset of dogs rather than the whole class and vice versa. Here we present a computational model of word learning which accounts for a wide range of previously conflicting experimental findings.


Distributed Morphology As A Regular Relation, Marina Ermolaeva, Daniel Edmiston 2018 University of Chicago

Distributed Morphology As A Regular Relation, Marina Ermolaeva, Daniel Edmiston

Proceedings of the Society for Computation in Linguistics

This research reorganizes the Distributed Morphology (DM) framework to work over strings. Typically, DM operates on binary trees, with the syntax-morphology interface implicitly treated as a tree-transducer. We contend that using (binary) trees is overpowered, predicting patterns unattested in natural language. Assuming the standard Y-model, DM operating on trees presumes that the flattening of the derivation for PF takes place post-morphology. We however flatten the structure above the morphological module, between the syntax and morphology. Restricting the morphological component to working on strings, we correctly predict that morphology can be modeled with regular string languages.


Digital Commons powered by bepress