Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 17 of 17

Full-Text Articles in Entire DC Network

Does Accuracy Matter?: Methodological Considerations When Using Automated Speech-To-Text For Social Science Research, Steven J. Pentland, Christie M. Fuller, Lee A. Spitzley, Douglas P. Twitchell Jan 2023

Does Accuracy Matter?: Methodological Considerations When Using Automated Speech-To-Text For Social Science Research, Steven J. Pentland, Christie M. Fuller, Lee A. Spitzley, Douglas P. Twitchell

IT and Supply Chain Management Faculty Publications and Presentations

The analysis of spoken language has been integral to a breadth of research in social science and beyond. However, for analyses to occur with efficiency, language must be in the form of computer-readable text. Historically, the speech-to-text process has occurred manually using human transcriptionists. Automated speech recognition (ASR) is advertised as an efficient and inexpensive alternative, but research shows this method of speech-to-text is prone to error. This paper investigates the viability of using error prone ASR transcriptions as part of the methodological process of language analysis. Results show that at the individual feature level, analysis of ASR transcriptions differ …


Automatic Scoring Of Speeded Interpersonal Assessment Center Exercises Via Machine Learning: Initial Psychometric Evidence And Practical Guidelines, Louis Hickman, Christoph N. Herde, Filip Lievens, Louis Tay Jan 2023

Automatic Scoring Of Speeded Interpersonal Assessment Center Exercises Via Machine Learning: Initial Psychometric Evidence And Practical Guidelines, Louis Hickman, Christoph N. Herde, Filip Lievens, Louis Tay

Research Collection Lee Kong Chian School Of Business

Assessment center (AC) exercises such as role-plays have established themselves as valuable approaches for obtaining insights into interpersonal behavior, but they are often considered the “Rolls Royce” of personnel assessment due to their high costs. The observation and rating process comprises a substantial part of these costs. In an exploratory case study, we capitalize on recent advances in natural language processing (NLP) by developing NLP-based machine learning (ML) models to investigate the possibility of automatically scoring AC exercises. First, we compared the convergent-related validity and contamination with word count of ML scores based on models that used different NLP methods …


Creating Data From Unstructured Text With Context Rule Assisted Machine Learning (Craml), Stephen Meisenbacher, Peter Norlander Dec 2022

Creating Data From Unstructured Text With Context Rule Assisted Machine Learning (Craml), Stephen Meisenbacher, Peter Norlander

School of Business: Faculty Publications and Other Works

Popular approaches to building data from unstructured text come with limitations, such as scalability, interpretability, replicability, and real-world applicability. These can be overcome with Context Rule Assisted Machine Learning (CRAML), a method and no-code suite of software tools that builds structured, labeled datasets which are accurate and reproducible. CRAML enables domain experts to access uncommon constructs within a document corpus in a low-resource, transparent, and flexible manner. CRAML produces document-level datasets for quantitative research and makes qualitative classification schemes scalable over large volumes of text. We demonstrate that the method is useful for bibliographic analysis, transparent analysis of proprietary data, …


Cloud-Based Machine Learning And Sentiment Analysis, Emmanuel C. Opara Jan 2022

Cloud-Based Machine Learning And Sentiment Analysis, Emmanuel C. Opara

Electronic Theses and Dissertations

The role of a Data Scientist is becoming increasingly ubiquitous as companies and institutions see the need to gain additional insights and information from data to make better decisions to improve the quality-of-service delivery to customers. This thesis document contains three aspects of data science projects aimed at improving tools and techniques used in analyzing and evaluating data. The first research study involved the use of a standard cybersecurity dataset and cloud-based auto-machine learning algorithms were applied to detect vulnerabilities in the network traffic data. The performance of the algorithms was measured and compared using standard evaluation metrics. The second …


Statistics-Based Anomaly Detection And Correction Method For Amazon Customer Reviews, Ishani Chatterjee Dec 2021

Statistics-Based Anomaly Detection And Correction Method For Amazon Customer Reviews, Ishani Chatterjee

Dissertations

People nowadays use the Internet to project their assessments, impressions, ideas, and observations about various subjects or products on numerous social networking sites. These sites serve as a great source of gathering information for data analytics, sentiment analysis, natural language processing, etc. The most critical challenge is interpreting this data and capturing the sentiment behind these expressions. Sentiment analysis is analyzing, processing, concluding, and inferencing subjective texts with the views. Companies use sentiment analysis to understand public opinions, perform market research, analyze brand reputation, recognize customer experiences, and study social media influence. According to the different needs for aspect granularity, …


Codes Of Ethics: Extending Classification Techniques With Natural Language Processing, Zachary Glass, E. Susanna Cahn Dec 2021

Codes Of Ethics: Extending Classification Techniques With Natural Language Processing, Zachary Glass, E. Susanna Cahn

The Journal of Values-Based Leadership

Language is an indicator of how stakeholders view an ethics code’s intent, and key to distinguishing code properties, such as promoting ethical-valued decision-making or code-based compliance. This article quantifies ethics codes’ language using Natural Language Processing (NLP), then uses machine learning to classify ethics codes. NLP overcomes some inherent difficulties of “measuring” verbal documents. Ethics codes selected from lists of “best” companies were compared with codes from a sample of Fortune 500 companies. Results show that some of these ethics codes are different enough from the norm to be distinguished by an algorithm; indicating as well that lists of “best” …


Aspect-Based Sentiment Analysis Of Movie Reviews, Samuel Onalaja, Eric Romero, Bosang Yun Dec 2021

Aspect-Based Sentiment Analysis Of Movie Reviews, Samuel Onalaja, Eric Romero, Bosang Yun

SMU Data Science Review

This study investigates a comparison of classification models used to determine aspect based separated text sentiment and predict binary sentiments of movie reviews with genre and aspect specific driving factors. To gain a broader classification analysis, five machine and deep learning algorithms were compared: Logistic Regression (LR), Naive Bayes (NB), Support Vector Machine (SVM), and Recurrent Neural Network Long-Short-Term Memory (RNN LSTM). The various movie aspects that are utilized to separate the sentences are determined through aggregating aspect words from lexicon-base, supervised and unsupervised learning. The driving factors are randomly assigned to various movie aspects and their impact tied to …


Text Analytics, Nlp, And Accounting Research, Richard M. Crowley Apr 2020

Text Analytics, Nlp, And Accounting Research, Richard M. Crowley

Research Collection School Of Accountancy

The presentation covered: What is text analytics and NLP?; How text analytics has evolved in the accounting literature since the 1980s; What current (as of 2020) methods are used in the literature; What methods are on the horizon.


Women On Boards Of Philippine Corporations: Quantitative Explorations, Maria C G Bautista, Marlene De Leon, Rudyard Jose R. Nano Iv Jan 2020

Women On Boards Of Philippine Corporations: Quantitative Explorations, Maria C G Bautista, Marlene De Leon, Rudyard Jose R. Nano Iv

Graduate School of Business Publications

This inductive study explored the likelihood and correlates of gender diversity in corporate boards in the Philippines. The improvement of gender diversity on boards is of advocacy and policy interest as the country emerges to middle-high income status. Logistic regression analyses from individuals' (in a directors' talent pool) responses to an online survey showed that females had a likely odds of 0.10 to be on the boards, compared to males. For every one female getting onto boards, 9 would be unable to. Females with advanced degrees were 7x likely to be on boards than female and male counterparts. The odds …


Early Detection Of Fake News On Social Media, Yang Liu Dec 2019

Early Detection Of Fake News On Social Media, Yang Liu

Dissertations

The ever-increasing popularity and convenience of social media enable the rapid widespread of fake news, which can cause a series of negative impacts both on individuals and society. Early detection of fake news is essential to minimize its social harm. Existing machine learning approaches are incapable of detecting a fake news story soon after it starts to spread, because they require certain amounts of data to reach decent effectiveness which take time to accumulate. To solve this problem, this research first analyzes and finds that, on social media, the user characteristics of fake news spreaders distribute significantly differently from those …


“Where’S The I-O?” Artificial Intelligence And Machine Learning In Talent Management Systems, Manuel F. Gonzalez, John F. Capman, Frederick L. Oswald, Evan R. Theys, David L. Tomczak Nov 2019

“Where’S The I-O?” Artificial Intelligence And Machine Learning In Talent Management Systems, Manuel F. Gonzalez, John F. Capman, Frederick L. Oswald, Evan R. Theys, David L. Tomczak

Personnel Assessment and Decisions

Artificial intelligence (AI) and machine learning (ML) have seen widespread adoption by organizations seeking to identify and hire high-quality job applicants. Yet the volume, variety, and velocity of professional involvement among I-O psychologists remains relatively limited when it comes to developing and evaluating AI/ML applications for talent assessment and selection. Furthermore, there is a paucity of empirical research that investigates the reliability, validity, and fairness of AI/ML tools in organizational contexts. To stimulate future involvement and research, we share our review and perspective on the current state of AI/ML in talent assessment as well as its benefits and potential pitfalls; …


Using Ai To Analyze Patent Claim Indefiniteness, Dean Alderucci, Kevin D. Ashley Jan 2019

Using Ai To Analyze Patent Claim Indefiniteness, Dean Alderucci, Kevin D. Ashley

Articles

In this Article, we describe how to use artificial intelligence (AI) techniques to partially automate a type of legal analysis, determining whether a patent claim satisfies the definiteness requirement. Although fully automating such a high-level cognitive task is well beyond state-of-the-art AI, we show that AI can nevertheless assist the decision maker in making this determination. Specifically, the use of custom AI technology can aid the decision maker by (1) mining patent text to rapidly bring relevant information to the decision maker attention, and (2) suggesting simple inferences that can be drawn from that information.

We begin by summarizing the …


Developing An Innovative Entity Extraction Method For Unstructured Data, Waleed A. Zaghloul, Silvana Trimi Jan 2017

Developing An Innovative Entity Extraction Method For Unstructured Data, Waleed A. Zaghloul, Silvana Trimi

Department of Management: Faculty Publications

The main goal of this study is to build high-precision extractors for entities such as Person and Organization as a good initial seed that can be used for training and learning in machine-learning systems, for the same categories, other categories, and across domains, languages, and applications. The improvement of entities extraction precision also increases the relationships extraction precision, which is particularly important in certain domains (such as intelligence systems, social networking, genetic studies, healthcare, etc.). These increases in precision improve the end users’ experience quality in using the extraction system because it lowers the time that users spend for training …


Information Filtering By Multiple Examples, Mingzhu Zhu May 2015

Information Filtering By Multiple Examples, Mingzhu Zhu

Dissertations

A key to successfully satisfy an information need lies in how users express it using keywords as queries. However, for many users, expressing their information needs using keywords is difficult, especially when the information need is complex. Search By Multiple Examples (SBME), a promising method for overcoming this problem, allows users to specify their information needs as a set of relevant documents rather than as a set of keywords.

Most of the studies on SBME adopt the Positive Unlabeled learning (PU learning) techniques by treating the user's provided examples (denoted as query examples) as positive set and the entire data …


Robust Determinants Of Bilateral Trade, Marianne Baxter, Jonathan Hersh May 2015

Robust Determinants Of Bilateral Trade, Marianne Baxter, Jonathan Hersh

Economics Faculty Articles and Research

What are the policies and country-level conditions which best explain bilateral trade flows between countries? As databases expand, an increasing number of possible explanatory variables are proposed that influence bilateral trade without a clear indication of which variables are robustly important across contexts, time periods, and which are not sensitive to inclusion of other control variables. To shed light on this problem, we apply three model selection methods – Lasso reguarlized regression, Bayesian Model Averaging, and Extreme Bound Analysis -- to candidate variables in a gravity models of trade. Using a panel of 198 countries covering the years 1970 to …


Svmaud: Using Textual Information To Predict The Audience Level Of Written Works Using Support Vector Machines, Todd Will Jan 2014

Svmaud: Using Textual Information To Predict The Audience Level Of Written Works Using Support Vector Machines, Todd Will

Dissertations

Information retrieval systems should seek to match resources with the reading ability of the individual user; similarly, an author must choose vocabulary and sentence structures appropriate for his or her audience. Traditional readability formulas, including the popular Flesch-Kincaid Reading Age and the Dale-Chall Reading Ease Score, rely on numerical representations of text characteristics, including syllable counts and sentence lengths, to suggest audience level of resources. However, the author’s chosen vocabulary, sentence structure, and even the page formatting can alter the predicted audience level by several levels, especially in the case of digital library resources. For these reasons, the performance of …


Using Symbolic Knowledge In The Umls To Disambiguate Words In Small Datasets With A Naive Bayes Classifier, Gondy Leroy, Thomas C. Rindflesch Jan 2004

Using Symbolic Knowledge In The Umls To Disambiguate Words In Small Datasets With A Naive Bayes Classifier, Gondy Leroy, Thomas C. Rindflesch

CGU Faculty Publications and Research

Current approaches to word sense disambiguation use and combine various machine-learning techniques. Most refer to characteristics of the ambiguous word and surrounding words and are based on hundreds of examples. Unfortunately, developing large training sets is time-consuming. We investigate the use of symbolic knowledge to augment machine-learning techniques for small datasets. UMLS semantic types assigned to concepts found in the sentence and relationships between these semantic types form the knowledge base. A naïve Bayes classifier was trained for 15 words with 100 examples for each. The most frequent sense of a word served as the baseline. The effect of increasingly …