Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Transformer

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 61

Full-Text Articles in Physical Sciences and Mathematics

Comparative Analysis Of Models For Predicting Stock Option Volatility, Michael Veino Aug 2024

Comparative Analysis Of Models For Predicting Stock Option Volatility, Michael Veino

Electronic Theses and Dissertations

This thesis aims to compare existing methodologies against new, transformer-based deep neural networks in predicting implied volatility (IV) of stock options. The implied volatility reflects investor sentiment regarding the underlying stock and provides insight into how the asset may move in price in the near future. Accurate prediction of IV can help investors allocate their holdings and improve option strategies to reduce risk in the process. As researchers test newer, more advanced models for predicting IV, the results improve when using traditional regression metrics such as root mean squared error (RMSE), but not when considering the Sharpe Ratio or how …


A Highly Robust Target Tracking Algorithm Merging Cnn And Transformer, Peijin Liu, Xuefeng Fu, Haofeng Sun, Lin He, Shujie Liu Aug 2024

A Highly Robust Target Tracking Algorithm Merging Cnn And Transformer, Peijin Liu, Xuefeng Fu, Haofeng Sun, Lin He, Shujie Liu

Journal of System Simulation

Abstract: To address the performance degradation of target tracking algorithms caused by target object deformation, scale variation, fast motion, and occlusion, a highly robust target tracking algorithm that Merging a CNN and Transformer is proposed based on siamese network architecture. In the feature extraction stage, standard convolutions are employed to extract shallow local feature information, while a convolution-like Transformer module is designed in the deep network to model global information. The pixel values in the Transformer are computed using a sliding window significantly reducing computational complexity. In the feature aggregation stage, a multi-head cross-attention module is utilized to construct a …


Hierarchical Damage Correlations For Old Photo Restoration, Weiwei Cai, Xuemiao Xu, Jiajia Xu, Huaidong Zhang, Haoxin Yang, Kun Zhang, Shengfeng He Jul 2024

Hierarchical Damage Correlations For Old Photo Restoration, Weiwei Cai, Xuemiao Xu, Jiajia Xu, Huaidong Zhang, Haoxin Yang, Kun Zhang, Shengfeng He

Research Collection School Of Computing and Information Systems

Restoring old photographs can preserve cherished memories. Previous methods handled diverse damages within the same network structure, which proved impractical. In addition, these methods cannot exploit correlations among artifacts, especially in scratches versus patch-misses issues. Hence, a tailored network is particularly crucial. In light of this, we propose a unified framework consisting of two key components: ScratchNet and PatchNet. In detail, ScratchNet employs the parallel Multi-scale Partial Convolution Module to effectively repair scratches, learning from multi-scale local receptive fields. In contrast, the patch-misses necessitate the network to emphasize global information. To this end, we incorporate a transformer-based encoder and decoder …


On The Feasibility Of Simple Transformer For Dynamic Graph Modeling, Yuxia Wu, Yuan Fang, Lizi Liao May 2024

On The Feasibility Of Simple Transformer For Dynamic Graph Modeling, Yuxia Wu, Yuan Fang, Lizi Liao

Research Collection School Of Computing and Information Systems

Dynamic graph modeling is crucial for understanding complex structures in web graphs, spanning applications in social networks, recommender systems, and more. Most existing methods primarily emphasize structural dependencies and their temporal changes. However, these approaches often overlook detailed temporal aspects or struggle with long-term dependencies. Furthermore, many solutions overly complicate the process by emphasizing intricate module designs to capture dynamic evolutions. In this work, we harness the strength of the Transformer’s self-attention mechanism, known for adeptly handling long-range dependencies in sequence modeling. Our approach offers a simple Transformer model, called SimpleDyG, tailored for dynamic graph modeling without complex modifications. We …


Lmcrot: An Enhanced Protein Crotonylation Site Predictor By Leveraging An Interpretable Window-Level Embedding From A Transformer-Based Protein Language Model, Pawel Pratyush, Soufia Bahmani, Suresh Pokharel, Hamid D. Ismail, Dukka Bahadur Apr 2024

Lmcrot: An Enhanced Protein Crotonylation Site Predictor By Leveraging An Interpretable Window-Level Embedding From A Transformer-Based Protein Language Model, Pawel Pratyush, Soufia Bahmani, Suresh Pokharel, Hamid D. Ismail, Dukka Bahadur

Michigan Tech Publications, Part 2

MOTIVATION: Recent advancements in natural language processing have highlighted the effectiveness of global contextualized representations from Protein Language Models (pLMs) in numerous downstream tasks. Nonetheless, strategies to encode the site-of-interest leveraging pLMs for per-residue prediction tasks, such as crotonylation (Kcr) prediction, remain largely uncharted. RESULTS: Herein, we adopt a range of approaches for utilizing pLMs by experimenting with different input sequence types (full-length protein sequence versus window sequence), assessing the implications of utilizing per-residue embedding of the site-of-interest as well as embeddings of window residues centered around it. Building upon these insights, we developed a novel residual ConvBiLSTM network designed …


Historical Perspectives In Volatility Forecasting Methods With Machine Learning, Zhiang Qiu, Clemens Kownatzki, Fabien Scalzo, Eun Sang Cha Mar 2024

Historical Perspectives In Volatility Forecasting Methods With Machine Learning, Zhiang Qiu, Clemens Kownatzki, Fabien Scalzo, Eun Sang Cha

Seaver College Research And Scholarly Achievement Symposium

Volatility forecasting in the financial market plays a pivotal role across a spectrum of disciplines, such as risk management, option pricing, and market making. However, volatility forecasting is challenging because volatility can only be estimated, and different factors influence volatility, ranging from macroeconomic indicators to investor sentiments. While recent works suggest advances in machine learning and artificial intelligence for volatility forecasting, a comprehensive benchmark of current statistical and learning-based methods for such purposes is lacking. Thus, this paper aims to provide a comprehensive survey of the historical evolution of volatility forecasting with a comparative benchmark of key landmark models. We …


Machine Learning Approaches In Comparative Studies For Alzheimer’S Diagnosis Using 2d Mri Slices, Zhen Zhao, Joon Huang Chuah, Chee-Onn Chow, Kaijian Xia, Yee Kai Tee, Yan Chai Hum, Khin Wee Lai Feb 2024

Machine Learning Approaches In Comparative Studies For Alzheimer’S Diagnosis Using 2d Mri Slices, Zhen Zhao, Joon Huang Chuah, Chee-Onn Chow, Kaijian Xia, Yee Kai Tee, Yan Chai Hum, Khin Wee Lai

Turkish Journal of Electrical Engineering and Computer Sciences

Alzheimer’s disease (AD) is an illness that involves a gradual and irreversible degeneration of the brain. It is crucial to establish a precise diagnosis of AD early on in order to enable prompt therapies and prevent further deterioration. Researchers are currently focusing increasing attention on investigating the potential of machine learning techniques to simplify the automated diagnosis of AD using neuroimaging. The present study involved a comparison of models for the detection of AD through the utilization of 2D image slices obtained from magnetic resonance imaging brain scans. Five models, namely ResNet, ConvNeXt, CaiT, Swin Transformer, and CVT, were implemented …


Xlnet4rec: Modeling User’S Long-Term And Short-Term Interests In E-Commerce Recommender Systems, Namarta Vij Jan 2024

Xlnet4rec: Modeling User’S Long-Term And Short-Term Interests In E-Commerce Recommender Systems, Namarta Vij

Electronic Theses and Dissertations

In e-commerce, a sequential recommender system is often used to predict the item that the user is likely to select next. This prediction can be used to create a recommender system to assist the user in making selections. However, when the user’s interests evolve over time, it becomes challenging to make such personalized recommendations. A more accurate recommender system thus needs to effectively interpret and adapt to a user’s changing interests by considering user’s long-term and short-term interests. Many attention-based methods focus on a user’s last clicked item to learn short-term interests. However, this approach may not consistently represent the …


De Novo Drug Design Using Transformer-Based Machine Translation And Reinforcement Learning Of An Adaptive Monte Carlo Tree Search, Dony Ang, Cyril Rakovski, Hagop S. Atamian Jan 2024

De Novo Drug Design Using Transformer-Based Machine Translation And Reinforcement Learning Of An Adaptive Monte Carlo Tree Search, Dony Ang, Cyril Rakovski, Hagop S. Atamian

Biology, Chemistry, and Environmental Sciences Faculty Articles and Research

The discovery of novel therapeutic compounds through de novo drug design represents a critical challenge in the field of pharmaceutical research. Traditional drug discovery approaches are often resource intensive and time consuming, leading researchers to explore innovative methods that harness the power of deep learning and reinforcement learning techniques. Here, we introduce a novel drug design approach called drugAI that leverages the Encoder–Decoder Transformer architecture in tandem with Reinforcement Learning via a Monte Carlo Tree Search (RL-MCTS) to expedite the process of drug discovery while ensuring the production of valid small molecules with drug-like characteristics and strong binding affinities towards …


Study Of Augmentations On Historical Manuscripts Using Trocr, Erez Meoded Dec 2023

Study Of Augmentations On Historical Manuscripts Using Trocr, Erez Meoded

Theses and Dissertations

Historical manuscripts are an essential source of original content. For many reasons, it is hard to recognize these manuscripts as text. This thesis used a state-of-the-art Handwritten Text Recognizer, TrOCR, to recognize a 16th-century manuscript. TrOCR uses a vision transformer to encode the input images and a language transformer to decode them back to text. We showed that carefully preprocessed images and designed augmentations can improve the performance of TrOCR. We suggest an ensemble of augmented models to achieve an even better performance.


A Bridge Between Graph Neural Networks And Transformers: Positional Encodings As Node Embeddings, Bright Kwaku Manu Dec 2023

A Bridge Between Graph Neural Networks And Transformers: Positional Encodings As Node Embeddings, Bright Kwaku Manu

Electronic Theses and Dissertations

Graph Neural Networks and Transformers are very powerful frameworks for learning machine learning tasks. While they were evolved separately in diverse fields, current research has revealed some similarities and links between them. This work focuses on bridging the gap between GNNs and Transformers by offering a uniform framework that highlights their similarities and distinctions. We perform positional encodings and identify key properties that make the positional encodings node embeddings. We found that the properties of expressiveness, efficiency and interpretability were achieved in the process. We saw that it is possible to use positional encodings as node embeddings, which can be …


Monocular Depth Estimation For Glass Walls With Context: A New Dataset And Method, Yuan Liang, Bailin Deng, Wenxi Liu, Jing Qin, Shengfeng He Dec 2023

Monocular Depth Estimation For Glass Walls With Context: A New Dataset And Method, Yuan Liang, Bailin Deng, Wenxi Liu, Jing Qin, Shengfeng He

Research Collection School Of Computing and Information Systems

Traditional monocular depth estimation assumes that all objects are reliably visible in the RGB color domain. However, this is not always the case as more and more buildings are decorated with transparent glass walls. This problem has not been explored due to the difficulties in annotating the depth levels of glass walls, as commercial depth sensors cannot provide correct feedbacks on transparent objects. Furthermore, estimating depths from transparent glass walls requires the aids of surrounding context, which has not been considered in prior works. To cope with this problem, we introduce the first Glass Walls Depth Dataset (GW-Depth dataset). We …


Metaformer Baselines For Vision, Weihao Yu, Chenyang Si, Pan Zhou, Mi Luo, Yichen Zhou, Jiashi Feng, Shuicheng Yan, Xinchao Wang Nov 2023

Metaformer Baselines For Vision, Weihao Yu, Chenyang Si, Pan Zhou, Mi Luo, Yichen Zhou, Jiashi Feng, Shuicheng Yan, Xinchao Wang

Research Collection School Of Computing and Information Systems

Abstract—MetaFormer, the abstracted architecture of Transformer, has been found to play a significant role in achieving competitive performance. In this paper, we further explore the capacity of MetaFormer, again, by migrating our focus away from the token mixer design: we introduce several baseline models under MetaFormer using the most basic or common mixers, and demonstrate their gratifying performance. We summarize our observations as follows: (1) MetaFormer ensures solid lower bound of performance. By merely adopting identity mapping as the token mixer, the MetaFormer model, termed IdentityFormer, achieves >80% accuracy on ImageNet-1K. (2) MetaFormer works well with arbitrary token mixers. When …


A Unified Query-Based Paradigm For Camouflaged Instance Segmentation, Bo Dong, Jialun Pei, Rongrong Gao, Tian Zhu Xiang, Shuo Wang, Huan Xiong Oct 2023

A Unified Query-Based Paradigm For Camouflaged Instance Segmentation, Bo Dong, Jialun Pei, Rongrong Gao, Tian Zhu Xiang, Shuo Wang, Huan Xiong

Machine Learning Faculty Publications

Due to the high similarity between camouflaged instances and the background, the recently proposed camouflaged instance segmentation (CIS) faces challenges in accurate localization and instance segmentation. To this end, inspired by query-based transformers, we propose a unified query-based multi-task learning framework for camouflaged instance segmentation, termed UQFormer, which builds a set of mask queries and a set of boundary queries to learn a shared composed query representation and efficiently integrates global camouflaged object region and boundary cues, for simultaneous instance segmentation and instance boundary detection in camouflaged scenarios. Specifically, we design a composed query learning paradigm that learns a shared …


Dtitd: An Intelligent Insider Threat Detection Framework Based On Digital Twin And Self-Attention Based Deep Learning Models, Zhi Qiang Wang, Abdulmotaleb El Saddik Oct 2023

Dtitd: An Intelligent Insider Threat Detection Framework Based On Digital Twin And Self-Attention Based Deep Learning Models, Zhi Qiang Wang, Abdulmotaleb El Saddik

Computer Vision Faculty Publications

Recent statistics and studies show that the loss generated by insider threats is much higher than that generated by external attacks. More and more organizations are investing in or purchasing insider threat detection systems to prevent insider risks. However, the accurate and timely detection of insider threats faces significant challenges. In this study, we proposed an intelligent insider threat detection framework based on Digital Twins and self-attentions based deep learning models. First, this paper introduces insider threats and the challenges in detecting them. Then this paper presents recent related works on solving insider threat detection problems and their limitations. Next, …


Focal Modulation Network For Lung Segmentation In Chest X-Ray Images, Şaban Öztürk, Tolga Çukur Oct 2023

Focal Modulation Network For Lung Segmentation In Chest X-Ray Images, Şaban Öztürk, Tolga Çukur

Turkish Journal of Electrical Engineering and Computer Sciences

Segmentation of lung regions is of key importance for the automatic analysis of Chest X-Ray (CXR) images, which have a vital role in the detection of various pulmonary diseases. Precise identification of lung regions is the basic prerequisite for disease diagnosis and treatment planning. However, achieving precise lung segmentation poses significant challenges due to factors such as variations in anatomical shape and size, the presence of strong edges at the rib cage and clavicle, and overlapping anatomical structures resulting from diverse diseases. Although commonly considered as the de-facto standard in medical image segmentation, the convolutional UNet architecture and its variants …


A Novel Fuzzy Relative-Position-Coding Transformer For Breast Cancer Diagnosis Using Ultrasonography, Yanhui Guo, Ruquan Jiang, Xin Gu, Heng-Da Cheng, Harish Garg Sep 2023

A Novel Fuzzy Relative-Position-Coding Transformer For Breast Cancer Diagnosis Using Ultrasonography, Yanhui Guo, Ruquan Jiang, Xin Gu, Heng-Da Cheng, Harish Garg

Computer Science Faculty and Staff Publications

Breast cancer is a leading cause of death in women worldwide, and early detection is crucial for successful treatment. Computer-aided diagnosis (CAD) systems have been developed to assist doctors in identifying breast cancer on ultrasound images. In this paper, we propose a novel fuzzy relative-position-coding (FRPC) Transformer to classify breast ultrasound (BUS) images for breast cancer diagnosis. The proposed FRPC Transformer utilizes the self-attention mechanism of Transformer networks combined with fuzzy relative-position-coding to capture global and local features of the BUS images. The performance of the proposed method is evaluated on one benchmark dataset and compared with those obtained by …


Emotion-Aware Music Recommendation, Hieu Tran, Tuan Le, Anh Do, Tram Vu, Steven Bogaerts, Brian T. Howard Sep 2023

Emotion-Aware Music Recommendation, Hieu Tran, Tuan Le, Anh Do, Tram Vu, Steven Bogaerts, Brian T. Howard

Computer Science Faculty publications

It is common to listen to songs that match one's mood. Thus, an AI music recommendation system that is aware of the user's emotions is likely to provide a superior user experience to one that is unaware. In this paper, we present an emotion-aware music recommendation system. Multiple models are discussed and evaluated for affect identification from a live image of the user. We propose two models: DRViT, which applies dynamic routing to vision transformers, and InvNet50, which uses involution. All considered models are trained and evaluated on the AffectNet dataset. Each model outputs the user's estimated valence and arousal …


Gpachov At Checkthat! 2023: A Diverse Multi-Approach Ensemble For Subjectivity Detection In News Articles, Georgi Pachov, Dimitar Dimitrov, Ivan Koychev, Preslav Nakov Sep 2023

Gpachov At Checkthat! 2023: A Diverse Multi-Approach Ensemble For Subjectivity Detection In News Articles, Georgi Pachov, Dimitar Dimitrov, Ivan Koychev, Preslav Nakov

Natural Language Processing Faculty Publications

The wide-spread use of social networks has given rise to subjective, misleading, and even false information on the Internet. Thus, subjectivity detection can play an important role in ensuring the objectiveness and the quality of a piece of information. This paper presents the solution built by the Gpachov team for the CLEF-2023 CheckThat! lab Task 2 on subjectivity detection. Three different research directions are explored. The first one is based on fine-tuning a sentence embeddings encoder model and dimensionality reduction. The second one explores a sample-efficient few-shot learning model. The third one evaluates fine-tuning a multilingual transformer on an altered …


Self-Supervised Pretraining And Transfer Learning On Fmri Data With Transformers, Sean Paulsen Aug 2023

Self-Supervised Pretraining And Transfer Learning On Fmri Data With Transformers, Sean Paulsen

Dartmouth College Ph.D Dissertations

Transfer learning is a machine learning technique founded on the idea that knowledge acquired by a model during “pretraining” on a source task can be transferred to the learning of a target task. Successful transfer learning can result in improved performance, faster convergence, and reduced demand for data. This technique is particularly desirable for the task of brain decoding in the domain of functional magnetic resonance imaging (fMRI), wherein even the most modern machine learning methods can struggle to decode labelled features of brain images. This challenge is due to the highly complex underlying signal, physical and neurological differences between …


The Student Becomes The Teacher: Training High-Performance Language Models More Sample-Efficiently From Small Models Via Superstilling, Chaz Allen Gundry Aug 2023

The Student Becomes The Teacher: Training High-Performance Language Models More Sample-Efficiently From Small Models Via Superstilling, Chaz Allen Gundry

Theses and Dissertations

Recent advances including the Transformer architecture have revolutionized the Natural Language Processing community by providing immense performance improvements across many tasks, including the development of Large Language Models (LLMs). LLMs show enormous promise as few-shot learners, common-sense knowledge repositories, conversational agents, writing assistants, and coding tools, and are gaining widespread traction in commercial industry. However, LLMs are expensive and time-consuming to train, requiring many passes over terabytes of data for the largest models. In this paper, we present Superstilling, a method for reducing the sample complexity of language model training by distilling the knowledge from a previously-trained model (the teacher) …


Traditional Vs Machine Learning Approaches: A Comparison Of Time Series Modeling Methods, Miguel E. Bonilla Jr., Jason Mcdonald, Tamas Toth, Bivin Sadler Aug 2023

Traditional Vs Machine Learning Approaches: A Comparison Of Time Series Modeling Methods, Miguel E. Bonilla Jr., Jason Mcdonald, Tamas Toth, Bivin Sadler

SMU Data Science Review

In recent years, various new Machine Learning and Deep Learning algorithms have been introduced, claiming to offer better performance than traditional statistical approaches when forecasting time series. Studies seeking evidence to support the usage of ML/DL over statistical approaches have been limited to comparing the forecasting performance of univariate, linear time series data. This research compares the performance of traditional statistical-based and ML/DL methods for forecasting multivariate and nonlinear time series.


Contrastive Video Question Answering Via Video Graph Transformer, Junbin Xiao Xiao, Pan Zhou, Angela Yao, Yicong Li, Richang Hong, Shuicheng Yan, Tat-Seng Chua Jul 2023

Contrastive Video Question Answering Via Video Graph Transformer, Junbin Xiao Xiao, Pan Zhou, Angela Yao, Yicong Li, Richang Hong, Shuicheng Yan, Tat-Seng Chua

Research Collection School Of Computing and Information Systems

We propose to perform video question answering (VideoQA) in a Contrastive manner via a Video Graph Transformer model (CoVGT). CoVGT’s uniqueness and superiority are three-fold: 1) It proposes a dynamic graph transformer module which encodes video by explicitly capturing the visual objects, their relations and dynamics, for complex spatio-temporal reasoning. 2) It designs separate video and text transformers for contrastive learning between the video and text to perform QA, instead of multi-modal transformer for answer classification. Fine-grained video-text communication is done by additional cross-modal interaction modules. 3) It is optimized by the joint fully- and self-supervised contrastive objectives between the …


3d Dental Biometrics: Transformer-Based Dental Arch Extraction And Matching, Zhiyuan Zhang, Zhong Xin Jun 2023

3d Dental Biometrics: Transformer-Based Dental Arch Extraction And Matching, Zhiyuan Zhang, Zhong Xin

Research Collection School Of Computing and Information Systems

The dental arch is a significant anatomical feature that is crucial in assessing tooth arrangement and configuration and has a potential for human identification in biometrics and digital forensic dentistry. In a previous study, we proposed an auto pose-invariant arch feature extraction Radial Ray Algorithm (RRA) and a matching framework [1] based solely on 3D dental geometry. To enhance the identification accuracy and speed of our previous work, we propose in this study a transformer architecture that can extract dental keypoints by encoding both local and global features. The dental arch is then constructed through robust interpolation of the dental …


Question Answering With Distilled Bert Models: A Case Study For Biomedical Data, Brittany Lewandowski, Rayon Morris, Pearly Merin Paul, Robert Slater Apr 2023

Question Answering With Distilled Bert Models: A Case Study For Biomedical Data, Brittany Lewandowski, Rayon Morris, Pearly Merin Paul, Robert Slater

SMU Data Science Review

In the healthcare industry today, 80% of data is unstructured (Razzak et al., 2019). The challenge this imposes on healthcare providers is that they rely on unstructured data to inform their decision-making. Although Electronic Health Records (EHRs) exist to integrate patient data, healthcare providers are still challenged with searching for information and answers contained within unstructured data. Prior NLP and Deep Learning research has shown that these methods can improve information extraction on unstructured medical documents. This research expands upon those studies by developing a Question Answering system using distilled BERT models. Healthcare providers can use this system on their …


Language Modeling Using Image Representations Of Natural Language, Seong Eun Cho Apr 2023

Language Modeling Using Image Representations Of Natural Language, Seong Eun Cho

Theses and Dissertations

This thesis presents training of an end-to-end autoencoder model using the transformer, with an encoder that can encode sentences into fixed-length latent vectors and a decoder that can reconstruct the sentences using image representations. Encoding and decoding sentences to and from these image representations are central to the model design. This method allows new sentences to be generated by traversing the Euclidean space, which makes vector arithmetic possible using sentences. Machines excel in dealing with concrete numbers and calculations, but do not possess an innate infrastructure designed to help them understand abstract concepts like natural language. In order for a …


Improving Rumor Detection By Promoting Information Campaigns With Transformer-Based Generative Adversarial Learning, Jing Ma, Jun Li, Wei Gao, Yang Yang, Kam-Fai Wong Mar 2023

Improving Rumor Detection By Promoting Information Campaigns With Transformer-Based Generative Adversarial Learning, Jing Ma, Jun Li, Wei Gao, Yang Yang, Kam-Fai Wong

Research Collection School Of Computing and Information Systems

Rumors can cause devastating consequences to individuals and our society. Analysis shows that the widespread of rumors typically results from deliberate promotion of information aiming to shape the collective public opinions on the concerned event. In this paper, we combat such chaotic phenomenon with a countermeasure by mirroring against how such chaos is created to make rumor detection more robust and effective. Our idea is inspired by adversarial learning method originated from Generative Adversarial Networks (GAN). We propose a GAN-style approach, where a generator is designed to produce uncertain or conflicting voices, further polarizing the original conversational threads to boost …


Disagreement Matters: Exploring Internal Diversification For Redundant Attention In Generic Facial Action Analysis, Xiaotian Li, Zheng Zhang, Xiang Zhang, Taoyue Wang, Zhihua Li, Huiyuan Yang, Umur Ciftci, Qiang Ji, Jeffrey Cohn, Lijun Yin Jan 2023

Disagreement Matters: Exploring Internal Diversification For Redundant Attention In Generic Facial Action Analysis, Xiaotian Li, Zheng Zhang, Xiang Zhang, Taoyue Wang, Zhihua Li, Huiyuan Yang, Umur Ciftci, Qiang Ji, Jeffrey Cohn, Lijun Yin

Computer Science Faculty Research & Creative Works

This paper demonstrates the effectiveness of a diversification mechanism for building a more robust multi-attention system in generic facial action analysis. While previous multi-attention (e.g., visual attention and self-attention) research on facial expression recognition (FER) and Action Unit (AU) detection have been thoroughly studied to focus on "external attention diversification", where attention branches localize different facial areas, we delve into the realm of "internal attention diversification" and explore the impact of diverse attention patterns within the same Region of Interest (RoI). Our experiments reveal that variability in attention patterns significantly impacts model performance, indicating that unconstrained multi-attention plagued by redundancy …


Comparative Analysis Of Transformer-Based Models For Text-To-Speech Normalization, Pankti Dholakia Jan 2023

Comparative Analysis Of Transformer-Based Models For Text-To-Speech Normalization, Pankti Dholakia

Master's Projects

Text-to-Speech (TTS) normalization is an essential component of natural language processing (NLP) that plays a crucial role in the production of natural-sounding synthesized speech. However, there are limitations to the TTS normalization procedure. Lengthy input sequences and variations in spoken language can present difficulties. The motivation behind this research is to address the challenges associated with TTS normalization by evaluating and comparing the performance of various models. The aim is to determine their effectiveness in handling language variations. The models include LSTM-GRU, Transformer, GCN-Transformer, GCNN-Transformer, Reformer, and a BERT language model that has been pre-trained. The research evaluates the performance …


Multimodal Emotion Analysis With Focused Attention, Siddhi Kiran Bajracharya Jan 2023

Multimodal Emotion Analysis With Focused Attention, Siddhi Kiran Bajracharya

Dissertations and Theses

Emotion analysis, a subset of sentiment analysis, involves the study of a wide array of emotional indicators. In contrast to sentiment analysis, which restricts its focus to positive and negative sentiments, emotion analysis extends beyond these limitations to a diverse spectrum of emotional cues. Contemporary trends in emotion analysis lean toward multimodal approaches that leverage audiovisual and text modalities. However, implementing multimodal strategies introduces its own set of challenges, marked by a rise in model complexity and an expansion of parameters, thereby creating a need for a larger volume of data. This thesis responds to this challenge by proposing a …