Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 5 of 5

Full-Text Articles in Physical Sciences and Mathematics

A Study Into The Feasibility Of Using Natural Language Processing And Machine Learning For The Identification Of Alcohol Misuse In Trauma Patients, Andrew Phillips Jan 2018

A Study Into The Feasibility Of Using Natural Language Processing And Machine Learning For The Identification Of Alcohol Misuse In Trauma Patients, Andrew Phillips

Master's Theses

Alcohol misuse is a leading cause of premature death in the United States, with nearly a third of trauma patients found to have elevated blood alcohol levels upon admission. However, timely intervention has been shown to reduce this. It is thus important to be able to quickly screen patients to identify alcohol misuse. Many medical centers use standardized questionnaires to identify alcohol misuse, but since these instruments are not usually a part of routine care, there are many cases where it is not done.

In this study, large quantities of notes were processed with natural language processing and machine learning …


Skewer: Sentiment Knowledge Extraction With Entity Recognition, Christopher James Wu Jun 2016

Skewer: Sentiment Knowledge Extraction With Entity Recognition, Christopher James Wu

Master's Theses

The California state legislature introduces approximately 5,000 new bills each legislative session. While the legislative hearings are recorded on video, the recordings are not easily accessible to the public. The lack of official transcripts or summaries also increases the effort required to gain meaningful insight from those recordings. Therefore, the news media and the general population are largely oblivious to what transpires during legislative sessions.

Digital Democracy, a project started by the Cal Poly Institute for Advanced Technology and Public Policy, is an online platform created to bring transparency to the California legislature. It features a searchable database of state …


Categorizing Blog Spam, Brandon Bevans Jun 2016

Categorizing Blog Spam, Brandon Bevans

Master's Theses

The internet has matured into the focal point of our era. Its ecosystem is vast, complex, and in many regards unaccounted for. One of the most prevalent aspects of the internet is spam. Similar to the rest of the internet, spam has evolved from simply meaning ‘unwanted emails’ to a blanket term that encompasses any unsolicited or illegitimate content that appears in the wide range of media that exists on the internet.

Many forms of spam permeate the internet, and spam architects continue to develop tools and methods to avoid detection. On the other side, cyber security engineers continue to …


The Application Of P-Bar Theory In Transformation-Based Error-Driven Learning, Bryant Harold Walley Dec 2014

The Application Of P-Bar Theory In Transformation-Based Error-Driven Learning, Bryant Harold Walley

Master's Theses

In P-bar Theory, Perkins et al. (2014) proposed a rule based method for determining the context of a partext (i.e., a part of a text document).

In Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging Brill (1995) demonstrates a method of error-driven learning applied to individual words at the sentence level to determine the part of speech each word represents.

We combine these two concepts providing a transformation-based error-driven learning algorithm to improve the results obtained from the static rules Perkins proposed and determine if the rule order prediction will provide additional metadata.


Misheard Me Oronyminator: Using Oronyms To Validate The Correctness Of Frequency Dictionaries, Jennifer G. Hughes Jun 2013

Misheard Me Oronyminator: Using Oronyms To Validate The Correctness Of Frequency Dictionaries, Jennifer G. Hughes

Master's Theses

In the field of speech recognition, an algorithm must learn to tell the difference between "a nice rock" and "a gneiss rock". These identical-sounding phrases are called oronyms. Word frequency dictionaries are often used by speech recognition systems to help resolve phonetic sequences with more than one possible orthographic phrase interpretation, by looking up which oronym of the root phonetic sequence contains the most-common words.

Our paper demonstrates a technique used to validate word frequency dictionary values. We chose to use frequency values from the UNISYN dictionary, which tallies each word on a per-occurance basis, using a proprietary text corpus, …