Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Language model

Discipline
Institution
Publication Year
Publication
Publication Type

Articles 1 - 20 of 20

Full-Text Articles in Physical Sciences and Mathematics

T-Sciq: Teaching Multimodal Chain-Of-Thought Reasoning Via Large Language Model Signals For Science Question Answering, Lei Wang, Yi Hu, Jiabang He, Xing Xu, Ning Liu, Hui Liu, Heng Tao Shen Mar 2024

T-Sciq: Teaching Multimodal Chain-Of-Thought Reasoning Via Large Language Model Signals For Science Question Answering, Lei Wang, Yi Hu, Jiabang He, Xing Xu, Ning Liu, Hui Liu, Heng Tao Shen

Research Collection School Of Computing and Information Systems

Large Language Models (LLMs) have recently demonstrated exceptional performance in various Natural Language Processing (NLP) tasks. They have also shown the ability to perform chain-of-thought (CoT) reasoning to solve complex problems. Recent studies have explored CoT reasoning in complex multimodal scenarios, such as the science question answering task, by fine-tuning multimodal models with high-quality human-annotated CoT rationales. However, collecting high-quality COT rationales is usually time-consuming and costly. Besides, the annotated rationales are hardly accurate due to the external essential information missed. To address these issues, we propose a novel method termed T-SciQ that aims at teaching science question answering with …


Breaking Down Computer Networking Instructional Videos: Automatic Summarization With Video Attributes And Language Models, Totok Sukardiyono, Muhammad Irfan Luthfi, Nisa Dwi Septiyanti Dec 2023

Breaking Down Computer Networking Instructional Videos: Automatic Summarization With Video Attributes And Language Models, Totok Sukardiyono, Muhammad Irfan Luthfi, Nisa Dwi Septiyanti

Elinvo (Electronics, Informatics, and Vocational Education)

Instructional videos have become a popular tool for teaching complex topics in computer networking. However, these videos can often be lengthy and time-consuming, making it difficult for learners to obtain the key information they need. In this study, we propose an approach that leverages automatic summarization and language models to generate concise and informative summaries of instructional videos. To enhance the performance of the summarization algorithm, we also incorporate video attributes that provide contextual information about the video content. Using a dataset of computer networking tutorials, we evaluate the effectiveness of the proposed method and show that it significantly improves …


On The Usage Of Continual Learning For Out-Of-Distribution Generalization In Pre-Trained Language Models Of Code, Martin Weyssow, Xin Zhou, Kisub Kim, David Lo, Houari A. Sahraoui Dec 2023

On The Usage Of Continual Learning For Out-Of-Distribution Generalization In Pre-Trained Language Models Of Code, Martin Weyssow, Xin Zhou, Kisub Kim, David Lo, Houari A. Sahraoui

Research Collection School Of Computing and Information Systems

Pre-trained language models (PLMs) have become a prevalent technique in deep learning for code, utilizing a two-stage pre-training and fine-tuning procedure to acquire general knowledge about code and specialize in a variety of downstream tasks. However, the dynamic nature of software codebases poses a challenge to the effectiveness and robustness of PLMs. In particular, world-realistic scenarios potentially lead to significant differences between the distribution of the pre-training and test data, i.e., distribution shift, resulting in a degradation of the PLM's performance on downstream tasks. In this paper, we stress the need for adapting PLMs of code to software data whose …


Just Adjust One Prompt: Enhancing In-Context Dialogue Scoring Via Constructing The Optimal Subgraph Of Demonstrations And Prompts, Jiashu Pu, Ling Cheng, Lu Fan, Tangjie Lv, Rongsheng Zhang Dec 2023

Just Adjust One Prompt: Enhancing In-Context Dialogue Scoring Via Constructing The Optimal Subgraph Of Demonstrations And Prompts, Jiashu Pu, Ling Cheng, Lu Fan, Tangjie Lv, Rongsheng Zhang

Research Collection School Of Computing and Information Systems

The use of modern Large Language Models (LLMs) as chatbots still has some problems such as hallucinations and lack of empathy. Identifying these issues can help improve chatbot performance. The community has been continually iterating on reference-free dialogue evaluation methods based on large language models (LLMs) that can be readily applied. However, many of these LLM-based metrics require selecting specific datasets and developing specialized training tasks for different evaluation dimensions (e.g., coherence, informative). The developing step can be time-consuming and may need to be repeated for new evaluation dimensions. To enable efficient and flexible adaptation to diverse needs of dialogue …


Robust Prompt Optimization For Large Language Models Against Distribution Shifts, Moxin Li, Wenjie Wang, Fuli Feng, Yixin Cao, Jizhi Zhang, Tat-Seng Chua Dec 2023

Robust Prompt Optimization For Large Language Models Against Distribution Shifts, Moxin Li, Wenjie Wang, Fuli Feng, Yixin Cao, Jizhi Zhang, Tat-Seng Chua

Research Collection School Of Computing and Information Systems

Large Language Model (LLM) has demonstrated significant ability in various Natural Language Processing tasks. However, their effectiveness is highly dependent on the phrasing of the task prompt, leading to research on automatic prompt optimization using labeled task data. We reveal that these prompt optimization techniques are vulnerable to distribution shifts such as subpopulation shifts, which are common for LLMs in real-world scenarios such as customer reviews analysis. In this light, we propose a new problem of robust prompt optimization for LLMs against distribution shifts, which requires the prompt optimized over the labeled source group can simultaneously generalize to an unlabeled …


A Psychometric Analysis Of Natural Language Inference Using Transformer Language Models, Antonio Laverghetta Jr. Oct 2023

A Psychometric Analysis Of Natural Language Inference Using Transformer Language Models, Antonio Laverghetta Jr.

USF Tampa Graduate Theses and Dissertations

Large language models (LLMs) are poised to transform both academia and industry. But the excitement around these generative AIs has also been met with concern for the true extent of their capabilities. This dissertation helps to address these questions by examining the capabilities of LLMs using the tools of psychometrics. We focus on analyzing the capabilities of LLMs on the task of natural language inference (NLI), a foundational benchmark often used to evaluate new models. We demonstrate that LLMs can reliably predict the psychometric properties of NLI items were those items administered to humans. Through a series of experiments, we …


The Devil Is In The Tails: How Long-Tailed Code Distributions Impact Large Language Models, Xin Zhou, Kisub Kim, Bowen Xu, Jiakun Liu, Donggyun Han, David Lo Sep 2023

The Devil Is In The Tails: How Long-Tailed Code Distributions Impact Large Language Models, Xin Zhou, Kisub Kim, Bowen Xu, Jiakun Liu, Donggyun Han, David Lo

Research Collection School Of Computing and Information Systems

Learning-based techniques, especially advanced Large Language Models (LLMs) for code, have gained considerable popularity in various software engineering (SE) tasks. However, most existing works focus on designing better learning-based models and pay less attention to the properties of datasets. Learning-based models, including popular LLMs for code, heavily rely on data, and the data's properties (e.g., data distribution) could significantly affect their behavior. We conducted an exploratory study on the distribution of SE data and found that such data usually follows a skewed distribution (i.e., long-tailed distribution) where a small number of classes have an extensive collection of samples, while a …


Enriched Pre-Trained Transformers For Joint Slot Filling And Intent Detection, Momchil Hardalov, Ivan Koychev, Preslav Nakov Sep 2023

Enriched Pre-Trained Transformers For Joint Slot Filling And Intent Detection, Momchil Hardalov, Ivan Koychev, Preslav Nakov

Natural Language Processing Faculty Publications

Detecting the user's intent and finding the corresponding slots among the utterance's words are important tasks in natural language understanding. Their interconnected nature makes their joint modeling a standard part of training such models. Moreover, data scarceness and specialized vocabularies pose additional challenges. Recently, the advances in pre-trained language models, namely contextualized models such as ELMo and BERT have revolutionized the field by tapping the potential of training very large models with just a few steps of fine-tuning on a task-specific dataset. Here, we leverage such models, and we design a novel architecture on top of them. Moreover, we propose …


Plan-And-Solve Prompting: Improving Zero-Shot Chain-Of-Thought Reasoning By Large Language Models, Lei Wang, Wanyu Xu, Yihuai Lan, Zhiqiang Hu, Yunshi Lan, Roy Ka-Wei Lee, Ee-Peng Lim Jul 2023

Plan-And-Solve Prompting: Improving Zero-Shot Chain-Of-Thought Reasoning By Large Language Models, Lei Wang, Wanyu Xu, Yihuai Lan, Zhiqiang Hu, Yunshi Lan, Roy Ka-Wei Lee, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

Large language models (LLMs) have recently been shown to deliver impressive performance in various NLP tasks. To tackle multi-step reasoning tasks, few-shot chain-of-thought (CoT) prompting includes a few manually crafted step-by-step reasoning demonstrations which enable LLMs to explicitly generate reasoning steps and improve their reasoning task accuracy. To eliminate the manual effort, Zeroshot-CoT concatenates the target problem statement with “Let’s think step by step” as an input prompt to LLMs. Despite the success of Zero-shot-CoT, it still suffers from three pitfalls: calculation errors, missing-step errors, and semantic misunderstanding errors. To address the missing-step errors, we propose Planand-Solve (PS) Prompting. It …


Combating Fake News: A Gravity Well Simulation To Model Echo Chamber Formation In Social Media, Jeremy E. Thompson Jan 2023

Combating Fake News: A Gravity Well Simulation To Model Echo Chamber Formation In Social Media, Jeremy E. Thompson

Dartmouth College Ph.D Dissertations

Fake news has become a serious concern as distributing misinformation has become easier and more impactful. A solution is critically required. One solution is to ban fake news, but that approach could create more problems than it solves, and would also be problematic from the beginning, as it must first be identified to be banned. We initially propose a method to automatically recognize suspected fake news, and to provide news consumers with more information as to its veracity. We suggest that fake news is comprised of two components: premises and misleading content. Fake news can be condensed down to a …


Modeling The Multi-Mode Distribution In Self-Supervised Language Models, Haw-Shiuan Chang Oct 2022

Modeling The Multi-Mode Distribution In Self-Supervised Language Models, Haw-Shiuan Chang

Doctoral Dissertations

Self-supervised large language models (LMs) have become a highly-influential and foundational tool for many NLP models. For this reason, their expressivity is an important topic of study. In near-universal practice, given the language context, the model predicts a word from the vocabulary using a single embedded vector representation of both context and dictionary entries. Note that the context sometimes implies that the distribution over predicted words should be multi-modal in embedded space. However, the context’s single-vector representation provably fails to capture such a distribution. To address this limitation, we propose to represent context with multiple vector embeddings, which we term …


Improving Negation Detection With Negation-Focused Pre-Training, Hung Thinh Truong, Timothy Baldwin, Trevor Cohn, Karin Verspoor Apr 2022

Improving Negation Detection With Negation-Focused Pre-Training, Hung Thinh Truong, Timothy Baldwin, Trevor Cohn, Karin Verspoor

Natural Language Processing Faculty Publications

Negation is a common linguistic feature that is crucial in many language understanding tasks, yet it remains a hard problem due to diversity in its expression in different types of text. Recent work has shown that state-of-the-art NLP models underperform on samples containing negation in various tasks, and that negation detection models do not transfer well across domains. We propose a new negation-focused pre-training strategy, involving targeted data augmentation and negation masking, to better incorporate negation information into language models. Extensive experiments on common benchmarks show that our proposed approach improves negation detection performance and generalizability over the strong baseline …


Shapley Idioms: Analysing Bert Sentence Embeddings For General Idiom Token Identification, Vasudevan Nedumpozhimana, Filip Klubicka, John Kelleher Jan 2022

Shapley Idioms: Analysing Bert Sentence Embeddings For General Idiom Token Identification, Vasudevan Nedumpozhimana, Filip Klubicka, John Kelleher

Articles

This article examines the basis of Natural Language Understanding of transformer based language models, such as BERT. It does this through a case study on idiom token classification. We use idiom token identification as a basis for our analysis because of the variety of information types that have previously been explored in the literature for this task, including: topic, lexical, and syntactic features. This variety of relevant information types means that the task of idiom token identification enables us to explore the forms of linguistic information that a BERT language model captures and encodes in its representations. The core of …


Achieving Hate Speech Detection In A Low Resource Setting, Peiyu Li May 2021

Achieving Hate Speech Detection In A Low Resource Setting, Peiyu Li

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Online social networks provide people with convenient platforms to communicate and share life moments. However, because of the anonymous property of these social media platforms, the cases of online hate speeches are increasing. Hate speech is defined by the Cambridge Dictionary as “public speech that expresses hate or encourages violence towards a person or group based on something such as race, religion, sex, or sexual orientation”. Online hate speech has caused serious negative effects to legitimate users, including mental or emotional stress, reputational damage, and fear for one’s safety. To protect legitimate online users, automatically hate speech detection techniques are …


Research On Image Description Method Based On Neural Network, Kong Rui, Xie Wei, Lei Tai Apr 2020

Research On Image Description Method Based On Neural Network, Kong Rui, Xie Wei, Lei Tai

Journal of System Simulation

Abstract: The automatic recognition and automatically describing image content is an important research direction to the artificial intelligence to connect the computer vision and the natural language processing. A method of describing the image content is proposed to generate the natural language by using the deep neural network model. The model consists of a convolutional neural network (CNN) and a recurrent neural network (RNN). The CNN is used to extract features of the input image to generate a fixed-length feature vector, which initializes the RNN to generate the sentences. Experimental results on the MSCOCO image description dataset show the syntactic …


Exploring The Impact Of Pretrained Bidirectional Language Models On Protein Secondary Structure Prediction, Dillon G. Daudert Dec 2018

Exploring The Impact Of Pretrained Bidirectional Language Models On Protein Secondary Structure Prediction, Dillon G. Daudert

Masters Theses

Protein secondary structure prediction (PSSP) involves determining the local conformations of the peptide backbone in a folded protein, and is often the first step in resolving a protein's global folded structure. Accurate structure prediction has important implications for understanding protein function and de novo protein design, with progress in recent years being driven by the application of deep learning methods such as convolutional and recurrent neural networks. Language models pretrained on large text corpora have been shown to learn useful representations for feature extraction and transfer learning across problem domains in natural language processing, most notably in instances where the …


Probability Of Belonging To A Language, Kevin Michael Brooks Cook Apr 2013

Probability Of Belonging To A Language, Kevin Michael Brooks Cook

Theses and Dissertations

Conventional language models estimate the probability that a word sequence within a chosen language will occur. By contrast, the purpose of our work is to estimate the probability that the word sequence belongs to the chosen language. The language of interest in our research is comprehensible well-formed English. We explain how conventional language models assume what we refer to as a degree of generalization, the extent to which a model generalizes from a given sequence. We explain why such an assumption may hinder estimation of the probability that a sequence belongs. We show that the probability that a word sequence …


Linking Entities To A Knowledge Base With Query Expansion, Swapna Gottipati, Jing Jiang Jul 2011

Linking Entities To A Knowledge Base With Query Expansion, Swapna Gottipati, Jing Jiang

Research Collection School Of Computing and Information Systems

In this paper we present a novel approach to entity linking based on a statistical language model-based information retrieval with query expansion. We use both local contexts and global world knowledge to expand query language models. We place a strong emphasis on named entities in the local contexts and explore a positional language model to weigh them differently based on their distances to the query. Our experiments on the TAC-KBP 2010 data show that incorporating such contextual information indeed aids in disambiguating the named entities and consistently improves the entity linking performance. Compared with the official results from KBP 2010 …


Novelty Detection For Cross-Lingual News Stories With Visual Duplicates And Speech Transcripts, Xiao Wu, Alexander G. Hauptmann, Chong-Wah Ngo Sep 2007

Novelty Detection For Cross-Lingual News Stories With Visual Duplicates And Speech Transcripts, Xiao Wu, Alexander G. Hauptmann, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

An overwhelming volume of news videos from different channels and languages is available today, which demands automatic management of this abundant information. To effectively search, retrieve, browse and track cross-lingual news stories, a news story similarity measure plays a critical role in assessing the novelty and redundancy among them. In this paper, we explore the novelty and redundancy detection with visual duplicates and speech transcripts for cross-lingual news stories. News stories are represented by a sequence of keyframes in the visual track and a set of words extracted from speech transcript in the audio track. A major difference to pure …


Near-Duplicate Keyframe Retrieval With Visual Keywords And Semantic Context, Xiao Wu, Wan-Lei Zhao, Chong-Wah Ngo Jul 2007

Near-Duplicate Keyframe Retrieval With Visual Keywords And Semantic Context, Xiao Wu, Wan-Lei Zhao, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Near-duplicate keyframes (NDK) play a unique role in large-scale video search, news topic detection and tracking. In this paper, we propose a novel NDK retrieval approach by exploring both visual and textual cues from the visual vocabulary and semantic context respectively. The vocabulary, which provides entries for visual keywords, is formed by the clustering of local keypoints. The semantic context is inferred from the speech transcript surrounding a keyframe. We experiment the usefulness of visual keywords and semantic context, separately and jointly, using cosine similarity and language models. By linearly fusing both modalities, performance improvement is reported compared with the …