Open Access. Powered by Scholars. Published by Universities.®

Graphics and Human Computer Interfaces Commons

Open Access. Powered by Scholars. Published by Universities.®

2,080 Full-Text Articles 3,210 Authors 682,343 Downloads 161 Institutions

All Articles in Graphics and Human Computer Interfaces

Faceted Search

2,080 full-text articles. Page 1 of 85.

Hierarchical Damage Correlations For Old Photo Restoration, Weiwei CAI, Xuemiao XU, Jiajia XU, Huaidong ZHANG, Haoxin YANG, Kun ZHANG, Shengfeng HE 2024 Singapore Management University

Hierarchical Damage Correlations For Old Photo Restoration, Weiwei Cai, Xuemiao Xu, Jiajia Xu, Huaidong Zhang, Haoxin Yang, Kun Zhang, Shengfeng He

Research Collection School Of Computing and Information Systems

Restoring old photographs can preserve cherished memories. Previous methods handled diverse damages within the same network structure, which proved impractical. In addition, these methods cannot exploit correlations among artifacts, especially in scratches versus patch-misses issues. Hence, a tailored network is particularly crucial. In light of this, we propose a unified framework consisting of two key components: ScratchNet and PatchNet. In detail, ScratchNet employs the parallel Multi-scale Partial Convolution Module to effectively repair scratches, learning from multi-scale local receptive fields. In contrast, the patch-misses necessitate the network to emphasize global information. To this end, we incorporate a transformer-based encoder and decoder …


Diffusion-Based Negative Sampling On Graphs For Link Prediction, Yuan FANG, Yuan FANG 2024 Singapore Management University

Diffusion-Based Negative Sampling On Graphs For Link Prediction, Yuan Fang, Yuan Fang

Research Collection School Of Computing and Information Systems

Link prediction is a fundamental task for graph analysis with important applications on the Web, such as social network analysis and recommendation systems, etc. Modern graph link prediction methods often employ a contrastive approach to learn robust node representations, where negative sampling is pivotal. Typical negative sampling methods aim to retrieve hard examples based on either predefined heuristics or automatic adversarial approaches, which might be inflexible or difficult to control. Furthermore, in the context of link prediction, most previous methods sample negative nodes from existing substructures of the graph, missing out on potentially more optimal samples in the latent space. …


Multigprompt For Multi-Task Pre-Training And Prompting On Graphs, Xingtong YU, Chang ZHOU, Yuan FANG, Xinming ZHAN 2024 Singapore Management University

Multigprompt For Multi-Task Pre-Training And Prompting On Graphs, Xingtong Yu, Chang Zhou, Yuan Fang, Xinming Zhan

Research Collection School Of Computing and Information Systems

Graph Neural Networks (GNNs) have emerged as a mainstream technique for graph representation learning. However, their efficacy within an end-to-end supervised framework is significantly tied to the availability of task-specific labels. To mitigate labeling costs and enhance robustness in few-shot settings, pre-training on self-supervised tasks has emerged as a promising method, while prompting has been proposed to further narrow the objective gap between pretext and downstream tasks. Although there has been some initial exploration of prompt-based learning on graphs, they primarily leverage a single pretext task, resulting in a limited subset of general knowledge that could be learned from the …


Multi-Script Handwriting Identification By Fragmenting Strokes, Joshua Jude Thomas 2024 University of South Alabama

Multi-Script Handwriting Identification By Fragmenting Strokes, Joshua Jude Thomas

<strong> Theses and Dissertations </strong>

This study tests the effectiveness of Multi-Script Handwriting Identification after simplifying character strokes, by segmenting them into sub-parts. Character simplification is performed through splitting the character by branching-points and end-points, a process called stroke fragmentation in this study. The resulting sub-parts of the character are called stroke fragments and are evaluated individually to identify the writer. This process shares similarities with the concept of stroke decomposition in Optical Character Recognition which attempts to recognize characters through the writing strokes that make them up. The main idea of this study is that the characters of different writing‑scripts (English, Chinese, etc.) may …


An Empirical Study On The Efficacy Of Llm-Powered Chatbots In Basic Information Retrieval Tasks, Naja Faysal 2024 California State University, San Bernardino

An Empirical Study On The Efficacy Of Llm-Powered Chatbots In Basic Information Retrieval Tasks, Naja Faysal

Electronic Theses, Projects, and Dissertations

The rise of conversational user interfaces (CUIs) powered by large language models (LLMs) is transforming human-computer interaction. This study evaluates the efficacy of LLM-powered chatbots, trained on website data, compared to browsing websites for finding information about organizations across diverse sectors. A within-subjects experiment with 165 participants was conducted, involving similar information retrieval (IR) tasks using both websites (GUIs) and chatbots (CUIs). The research questions are: (Q1) Which interface helps users find information faster: LLM chatbots or websites? (Q2) Which interface helps users find more accurate information: LLM chatbots or websites?. The findings are: (Q1) Participants found information significantly faster …


Binder, Tyler A. Peaster, Lindsey M. Davenport, Madelyn Little, Alex Bales 2024 Arkansas Tech University

Binder, Tyler A. Peaster, Lindsey M. Davenport, Madelyn Little, Alex Bales

ATU Research Symposium

Binder is a mobile application that aims to introduce readers to a book recommendation service that appeals to devoted and casual readers. The main goal of Binder is to enrich book selection and reading experience. This project was created in response to deficiencies in the mobile space for book suggestions, library management, and reading personalization. The tools we used to create the project include Visual Studio, .Net Maui Framework, C#, XAML, CSS, MongoDB, NoSQL, Git, GitHub, and Figma. The project’s selection of books were sourced from the Google Books repository. Binder aims to provide an intuitive interface that allows users …


Factors Influencing The Perceptions Of Human-Computer Interaction Curriculum Developers In Higher Education Institutions During Curriculum Design And Delivery, Cynthia Augustine, Salah Kabanda 2024 Department of Information Systems, University of Cape Town

Factors Influencing The Perceptions Of Human-Computer Interaction Curriculum Developers In Higher Education Institutions During Curriculum Design And Delivery, Cynthia Augustine, Salah Kabanda

The African Journal of Information Systems

Computer science (CS) and information systems students seeking to work as software developers upon graduating are often required to create software that has a sound user experience (UX) and meets the needs of its users. This includes addressing unique user, context, and infrastructural requirements. This study sought to identify the factors that influence the perceptions of human-computer interaction (HCI) curriculum developers in higher education institutions (HEIs) in developing economies of Africa when it comes to curriculum design and delivery. A qualitative enquiry was conducted and consisted of fourteen interviews with HCI curriculum developers and UX practitioners in four African countries. …


Immersive Japanese Language Learning Web Application Using Spaced Repetition, Active Recall, And An Artificial Intelligent Conversational Chat Agent Both In Voice And In Text, Marc Butler 2024 Southern Adventist University

Immersive Japanese Language Learning Web Application Using Spaced Repetition, Active Recall, And An Artificial Intelligent Conversational Chat Agent Both In Voice And In Text, Marc Butler

MS in Computer Science Project Reports

In the last two decades various human language learning applications, spaced repetition software, online dictionaries, and artificial intelligent chat agents have been developed. However, there is no solution to cohesively combine these technologies into a comprehensive language learning application including skills such as speaking, typing, listening, and reading. Our contribution is to provide an immersive language learning web application to the end user which combines spaced repetition, a study technique used to review information at systematic intervals, and active recall, the process of purposely retrieving information from memory during a review session, with an artificial intelligent conversational chat agent both …


Image De‑Photobombing Benchmark, Vatsa S. Patel, Kunal Agrawal, Samah Baraheem, Amira Yousif, Tam Nguyen 2024 University of Dayton

Image De‑Photobombing Benchmark, Vatsa S. Patel, Kunal Agrawal, Samah Baraheem, Amira Yousif, Tam Nguyen

Computer Science Faculty Publications

Removing photobombing elements from images is a challenging task that requires sophisticated image inpainting techniques. Despite the availability of various methods, their effectiveness depends on the complexity of the image and the nature of the distracting element. To address this issue, we conducted a benchmark study to evaluate 10 state-of-the-art photobombing removal methods on a dataset of over 300 images. Our study focused on identifying the most effective image inpainting techniques for removing unwanted regions from images. We annotated the photobombed regions that require removal and evaluated the performance of each method using peak signal-to-noise ratio (PSNR), structural similarity index …


Terry Riley's "In C" For Mobile Ensemble, David B. Wetzel, Griffin Moe, George K. Thiruvathukal 2024 Loyola University Chicago

Terry Riley's "In C" For Mobile Ensemble, David B. Wetzel, Griffin Moe, George K. Thiruvathukal

Computer Science: Faculty Publications and Other Works

This workshop presents a mobile-friendly Web Audio application for a “technology ensemble play-along” of Terry Riley’s 1964 composition In C. Attendees will join in a reading of In C using available web-enabled devices as musical instruments. We hope to demonstrate an accessible music-technology experience that relies on face-to-face interaction within a shared space. In this all-electronic implementation, no special musical or technical expertise is required.

Accepted for presentation and publication at WAC 2024.


Transiam: Aggregating Multi-Modal Visual Features With Locality For Medical Image Segmentation, Xuejian LI, Shiqiang MA, Junhai XU, Jijun TANG, Shengfeng HE, Fei GUO 2024 Central South University

Transiam: Aggregating Multi-Modal Visual Features With Locality For Medical Image Segmentation, Xuejian Li, Shiqiang Ma, Junhai Xu, Jijun Tang, Shengfeng He, Fei Guo

Research Collection School Of Computing and Information Systems

Automatic segmentation of medical images plays an important role in the diagnosis of diseases. On single-modal data, convolutional neural networks have demonstrated satisfactory performance. However, multi-modal data encompasses a greater amount of information rather than single-modal data. Multi-modal data can be effectively used to improve the segmentation accuracy of regions of interest by analyzing both spatial and temporal information. In this study, we propose a dual-path segmentation model for multi-modal medical images, named TranSiam. Taking into account that there is a significant diversity between the different modalities, TranSiam employs two parallel CNNs to extract the features which are specific to …


Hgprompt: Bridging Homogeneous And Heterogeneous Graphs For Few-Shot Prompt Learning, Xingtong YU, Yuan FANG, Zemin LIU, Xinming ZHANG 2024 Singapore Management University

Hgprompt: Bridging Homogeneous And Heterogeneous Graphs For Few-Shot Prompt Learning, Xingtong Yu, Yuan Fang, Zemin Liu, Xinming Zhang

Research Collection School Of Computing and Information Systems

Graph neural networks (GNNs) and heterogeneous graph neural networks (HGNNs) are prominent techniques for homogeneous and heterogeneous graph representation learning, yet their performance in an end-to-end supervised framework greatly depends on the availability of task-specific supervision. To reduce the labeling cost, pre-training on selfsupervised pretext tasks has become a popular paradigm, but there is often a gap between the pre-trained model and downstream tasks, stemming from the divergence in their objectives. To bridge the gap, prompt learning has risen as a promising direction especially in few-shot settings, without the need to fully fine-tune the pre-trained model. While there has been …


What Does One Billion Dollars Look Like?: Visualizing Extreme Wealth, William Mahoney Luckman 2024 The Graduate Center, City University of New York

What Does One Billion Dollars Look Like?: Visualizing Extreme Wealth, William Mahoney Luckman

Dissertations, Theses, and Capstone Projects

The word “billion” is a mathematical abstraction related to “big,” but it is difficult to understand the vast difference in value between one million and one billion; even harder to understand the vast difference in purchasing power between one billion dollars, and the average U.S. yearly income. Perhaps most difficult to conceive of is what that purchasing power and huge mass of capital translates to in terms of power. This project blends design, text, facts, and figures into an interactive narrative website that helps the user better understand their position in relation to extreme wealth: https://whatdoesonebilliondollarslooklike.website/

The site incorporates …


Foodmask: Real-Time Food Instance Counting, Segmentation And Recognition, Huu-Thanh NGUYEN, Yu CAO, Chong-wah NGO, Wing-Kwong CHAN 2024 Singapore Management University

Foodmask: Real-Time Food Instance Counting, Segmentation And Recognition, Huu-Thanh Nguyen, Yu Cao, Chong-Wah Ngo, Wing-Kwong Chan

Research Collection School Of Computing and Information Systems

Food computing has long been studied and deployed to several applications. Understanding a food image at the instance level, including recognition, counting and segmentation, is essential to quantifying nutrition and calorie consumption. Nevertheless, existing techniques are limited to either category-specific instance detection, which does not reflect precisely the instance size at the pixel level, or category-agnostic instance segmentation, which is insufficient for dish recognition. This paper presents a compact and fast multi-task network, namely FoodMask, for clustering-based food instance counting, segmentation and recognition. The network learns a semantic space simultaneously encoding food category distribution and instance height at pixel basis. …


Leveraging Llms And Generative Models For Interactive Known-Item Video Search, Zhixin MA, Jiaxin WU, Chong-wah NGO 2024 Singapore Management University

Leveraging Llms And Generative Models For Interactive Known-Item Video Search, Zhixin Ma, Jiaxin Wu, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

While embedding techniques such as CLIP have considerably boosted search performance, user strategies in interactive video search still largely operate on a trial-and-error basis. Users are often required to manually adjust their queries and carefully inspect the search results, which greatly rely on the users’ capability and proficiency. Recent advancements in large language models (LLMs) and generative models offer promising avenues for enhancing interactivity in video retrieval and reducing the personal bias in query interpretation, particularly in the known-item search. Specifically, LLMs can expand and diversify the semantics of the queries while avoiding grammar mistakes or the language barrier. In …


Delving Into Multimodal Prompting For Fine-Grained Visual Classification, Xin JIANG, Hao TANG, Junyao GAO, Xiaoyu DU, Shengfeng HE, Zechao LI 2024 Singapore Management University

Delving Into Multimodal Prompting For Fine-Grained Visual Classification, Xin Jiang, Hao Tang, Junyao Gao, Xiaoyu Du, Shengfeng He, Zechao Li

Research Collection School Of Computing and Information Systems

Fine-grained visual classification (FGVC) involves categorizing fine subdivisions within a broader category, which poses challenges due to subtle inter-class discrepancies and large intra-class variations. However, prevailing approaches primarily focus on uni-modal visual concepts. Recent advancements in pre-trained vision-language models have demonstrated remarkable performance in various high-level vision tasks, yet the applicability of such models to FGVC tasks remains uncertain. In this paper, we aim to fully exploit the capabilities of cross-modal description to tackle FGVC tasks and propose a novel multimodal prompting solution, denoted as MP-FGVC, based on the contrastive language-image pertaining (CLIP) model. Our MP-FGVC comprises a multimodal prompts …


Simple Image-Level Classification Improves Open-Vocabulary Object Detection, Ruohuan FANG, Guansong PANG, Xiao BAI 2024 Singapore Management University

Simple Image-Level Classification Improves Open-Vocabulary Object Detection, Ruohuan Fang, Guansong Pang, Xiao Bai

Research Collection School Of Computing and Information Systems

Open-Vocabulary Object Detection (OVOD) aims to detect novel objects beyond a given set of base categories on which the detection model is trained. Recent OVOD methods focus on adapting the image-level pre-trained vision-language models (VLMs), such as CLIP, to a region-level object detection task via, eg., region-level knowledge distillation, regional prompt learning, or region-text pre-training, to expand the detection vocabulary. These methods have demonstrated remarkable performance in recognizing regional visual concepts, but they are weak in exploiting the VLMs' powerful global scene understanding ability learned from the billion-scale image-level text descriptions. This limits their capability in detecting hard objects of …


M3sa: Multimodal Sentiment Analysis Based On Multi-Scale Feature Extraction And Multi-Task Learning, Changkai LIN, Hongju CHENG, Qiang RAO, Yang YANG 2024 Singapore Management University

M3sa: Multimodal Sentiment Analysis Based On Multi-Scale Feature Extraction And Multi-Task Learning, Changkai Lin, Hongju Cheng, Qiang Rao, Yang Yang

Research Collection School Of Computing and Information Systems

Sentiment analysis plays an indispensable part in human-computer interaction. Multimodal sentiment analysis can overcome the shortcomings of unimodal sentiment analysis by fusing multimodal data. However, how to extracte improved feature representations and how to execute effective modality fusion are two crucial problems in multimodal sentiment analysis. Traditional work uses simple sub-models for feature extraction, and they ignore features of different scales and fuse different modalities of data equally, making it easier to incorporate extraneous information and affect analysis accuracy. In this paper, we propose a Multimodal Sentiment Analysis model based on Multi-scale feature extraction and Multi-task learning (M 3 SA). …


Catnet: Cross-Modal Fusion For Audio-Visual Speech Recognition, Xingmei WANG, Jianchen MI, Boquan LI, Yixu ZHAO, Jiaxiang MENG 2024 Singapore Management University

Catnet: Cross-Modal Fusion For Audio-Visual Speech Recognition, Xingmei Wang, Jianchen Mi, Boquan Li, Yixu Zhao, Jiaxiang Meng

Research Collection School Of Computing and Information Systems

Automatic speech recognition (ASR) is a typical pattern recognition technology that converts human speeches into texts. With the aid of advanced deep learning models, the performance of speech recognition is significantly improved. Especially, the emerging Audio–Visual Speech Recognition (AVSR) methods achieve satisfactory performance by combining audio-modal and visual-modal information. However, various complex environments, especially noises, limit the effectiveness of existing methods. In response to the noisy problem, in this paper, we propose a novel cross-modal audio–visual speech recognition model, named CATNet. First, we devise a cross-modal bidirectional fusion model to analyze the close relationship between audio and visual modalities. Second, …


Digitizing Delphi: Educating Audiences Through Virtual Reconstruction, Kate Koury 2024 Purdue University

Digitizing Delphi: Educating Audiences Through Virtual Reconstruction, Kate Koury

The Journal of Purdue Undergraduate Research

Implementing a 3D model into a virtual space allows the general public to engage critically with archaeological processes. There are many unseen decisions that go into reconstructing an ancient temple. Analysis of available materials and techniques, predictions of how objects were used, decisions of what sources to reference, puzzle piecing broken remains together, and even educated guesses used to fill gaps in information often go unobserved by the public. This work will educate users about those choices by allowing the side-by-side comparison of conflicting theories on the reconstruction of the Tholos at Delphi, which is an ideal site because of …


Digital Commons powered by bepress