Open Access. Powered by Scholars. Published by Universities.®
Graphics and Human Computer Interfaces Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Institution
- Keyword
-
- Deep Learning (2)
- Feature extraction (2)
- Information Retrieval (2)
- Machine Learning (2)
- Out-of-turn interaction (2)
-
- Adaptation models (1)
- Advertisement (1)
- Agent network; Breakings; Linear spaces; Linear time; Many to many; Novel domain; Object information; Query video; Single frames; Video objects segmentations (1)
- Algorithm design and analysis (1)
- Algorithms (1)
- Analytical models (1)
- Artificial Intelligence (AI) (1)
- Attention mechanism (1)
- Audio-visual speech recognition (1)
- Automated Telephone Services (ATS) (1)
- Automatic Speech Recognition (ASR) (1)
- Benefits of AI (1)
- Bias in AI Systems (1)
- Case based reasoning (1)
- Classification (1)
- Clustering (1)
- Codes (1)
- Computational modeling (1)
- Computer programs (1)
- Computer science (1)
- Computer-aided Design (1)
- Conceptual Investigation (1)
- Context modeling (1)
- Convolution block attention module (1)
- Cross-modal fusion (1)
- Publication Year
Articles 1 - 30 of 51
Full-Text Articles in Graphics and Human Computer Interfaces
Catnet: Cross-Modal Fusion For Audio-Visual Speech Recognition, Xingmei Wang, Jianchen Mi, Boquan Li, Yixu Zhao, Jiaxiang Meng
Catnet: Cross-Modal Fusion For Audio-Visual Speech Recognition, Xingmei Wang, Jianchen Mi, Boquan Li, Yixu Zhao, Jiaxiang Meng
Research Collection School Of Computing and Information Systems
Automatic speech recognition (ASR) is a typical pattern recognition technology that converts human speeches into texts. With the aid of advanced deep learning models, the performance of speech recognition is significantly improved. Especially, the emerging Audio–Visual Speech Recognition (AVSR) methods achieve satisfactory performance by combining audio-modal and visual-modal information. However, various complex environments, especially noises, limit the effectiveness of existing methods. In response to the noisy problem, in this paper, we propose a novel cross-modal audio–visual speech recognition model, named CATNet. First, we devise a cross-modal bidirectional fusion model to analyze the close relationship between audio and visual modalities. Second, …
Tracking People Across Ultra Populated Indoor Spaces By Matching Unreliable Wi-Fi Signals With Disconnected Video Feeds, Quang Hai Truong, Dheryta Jaisinghani, Shubham Jain, Arunesh Sinha, Jeong Gil Ko, Rajesh Krishna Balan
Tracking People Across Ultra Populated Indoor Spaces By Matching Unreliable Wi-Fi Signals With Disconnected Video Feeds, Quang Hai Truong, Dheryta Jaisinghani, Shubham Jain, Arunesh Sinha, Jeong Gil Ko, Rajesh Krishna Balan
Research Collection School Of Computing and Information Systems
Tracking in dense indoor environments where several thousands of people move around is an extremely challenging problem. In this paper, we present a system — DenseTrack for tracking people in such environments. DenseTrack leverages data from the sensing modalities that are already present in these environments — Wi-Fi (from enterprise network deployments) and Video (from surveillance cameras). We combine Wi-Fi information with video data to overcome the individual errors induced by these modalities. More precisely, the locations derived from video are used to overcome the localization errors inherent in using Wi-Fi signals where precise Wi-Fi MAC IDs are used to …
Efficient Unsupervised Video Hashing With Contextual Modeling And Structural Controlling, Jingru Duan, Yanbin Hao, Bin Zhu, Lechao Cheng, Pengyuan Zhou, Xiang Wang
Efficient Unsupervised Video Hashing With Contextual Modeling And Structural Controlling, Jingru Duan, Yanbin Hao, Bin Zhu, Lechao Cheng, Pengyuan Zhou, Xiang Wang
Research Collection School Of Computing and Information Systems
The most important effect of the video hashing technique is to support fast retrieval, which is benefiting from the high efficiency of binary calculation. Current video hash approaches are thus mainly targeted at learning compact binary codes to represent video content accurately. However, they may overlook the generation efficiency for hash codes, i.e., designing lightweight neural networks. This paper proposes an method, which is not only for computing compact hash codes but also for designing a lightweight deep model. Specifically, we present an MLP-based model, where the video tensor is split into several groups and multiple axial contexts are explored …
Glance To Count: Learning To Rank With Anchors For Weakly-Supervised Crowd Counting, Zheng Xiong, Liangyu Chai, Wenxi Liu, Yongtuo Liu, Sucheng Ren, Shengfeng He
Glance To Count: Learning To Rank With Anchors For Weakly-Supervised Crowd Counting, Zheng Xiong, Liangyu Chai, Wenxi Liu, Yongtuo Liu, Sucheng Ren, Shengfeng He
Research Collection School Of Computing and Information Systems
Crowd image is arguably one of the most laborious data to annotate. In this paper, we devote to reduce the massive demand of densely labeled crowd data, and propose a novel weakly-supervised setting, in which we leverage the binary ranking of two images with highcontrast crowd counts as training guidance. To enable training under this new setting, we convert the crowd count regression problem to a ranking potential prediction problem. In particular, we tailor a Siamese Ranking Network that predicts the potential scores of two images indicating the ordering of the counts. Hence, the ultimate goal is to assign appropriate …
Constructing Holistic Spatio-Temporal Scene Graph For Video Semantic Role Labeling, Yu Zhao, Hao Fei, Yixin Cao, Bobo Li, Meishan Zhang, Jianguo Wei, Min Zhang, Tat-Seng Chua
Constructing Holistic Spatio-Temporal Scene Graph For Video Semantic Role Labeling, Yu Zhao, Hao Fei, Yixin Cao, Bobo Li, Meishan Zhang, Jianguo Wei, Min Zhang, Tat-Seng Chua
Research Collection School Of Computing and Information Systems
As one of the core video semantic understanding tasks, Video Semantic Role Labeling (VidSRL) aims to detect the salient events from given videos, by recognizing the predict-argument event structures and the interrelationships between events. While recent endeavors have put forth methods for VidSRL, they can be mostly subject to two key drawbacks, including the lack of fine-grained spatial scene perception and the insufficiently modeling of video temporality. Towards this end, this work explores a novel holistic spatio-temporal scene graph (namely HostSG) representation based on the existing dynamic scene graph structures, which well model both the fine-grained spatial semantics and temporal …
Npf-200: A Multi-Modal Eye Fixation Dataset And Method For Non-Photorealistic Videos, Ziyu Yang, Sucheng Ren, Zongwei Wu, Nanxuan Zhao, Junle Wang, Jing Qin, Shengfeng He
Npf-200: A Multi-Modal Eye Fixation Dataset And Method For Non-Photorealistic Videos, Ziyu Yang, Sucheng Ren, Zongwei Wu, Nanxuan Zhao, Junle Wang, Jing Qin, Shengfeng He
Research Collection School Of Computing and Information Systems
Non-photorealistic videos are in demand with the wave of the metaverse, but lack of sufficient research studies. This work aims to take a step forward to understand how humans perceive nonphotorealistic videos with eye fixation (i.e., saliency detection), which is critical for enhancing media production, artistic design, and game user experience. To fill in the gap of missing a suitable dataset for this research line, we present NPF-200, the first largescale multi-modal dataset of purely non-photorealistic videos with eye fixations. Our dataset has three characteristics: 1) it contains soundtracks that are essential according to vision and psychological studies; 2) it …
Matk: The Meme Analytical Tool Kit, Ming Shan Hee, Aditi Kumaresan, Nguyen Khoi Hoang, Nirmalendu Prakash, Rui Cao, Roy Ka-Wei Lee
Matk: The Meme Analytical Tool Kit, Ming Shan Hee, Aditi Kumaresan, Nguyen Khoi Hoang, Nirmalendu Prakash, Rui Cao, Roy Ka-Wei Lee
Research Collection School Of Computing and Information Systems
The rise of social media platforms has brought about a new digital culture called memes. Memes, which combine visuals and text, can strongly influence public opinions on social and cultural issues. As a result, people have become interested in categorizing memes, leading to the development of various datasets and multimodal models that show promising results in this field. However, there is currently a lack of a single library that allows for the reproduction, evaluation, and comparison of these models using fair benchmarks and settings. To fill this gap, we introduce the Meme Analytical Tool Kit (MATK), an open-source toolkit specifically …
Underwater Image Translation Via Multi-Scale Generative Adversarial Network, Dongmei Yang, Tianzi Zhang, Boquan Li, Menghao Li, Weijing Chen, Xiaoqing Li, Xingmei Wang
Underwater Image Translation Via Multi-Scale Generative Adversarial Network, Dongmei Yang, Tianzi Zhang, Boquan Li, Menghao Li, Weijing Chen, Xiaoqing Li, Xingmei Wang
Research Collection School Of Computing and Information Systems
The role that underwater image translation plays assists in generating rare images for marine applications. However, such translation tasks are still challenging due to data lacking, insufficient feature extraction ability, and the loss of content details. To address these issues, we propose a novel multi-scale image translation model based on style-independent discriminators and attention modules (SID-AM-MSITM), which learns the mapping relationship between two unpaired images for translation. We introduce Convolution Block Attention Modules (CBAM) to the generators and discriminators of SID-AM-MSITM to improve its feature extraction ability. Moreover, we construct style-independent discriminators that enable the discriminant results of SID-AM-MSITM to …
Adavis: Adaptive And Explainable Visualization Recommendation For Tabular Data, Songheng Zhang, Yong Wang, Haotian Li, Huamin Qu
Adavis: Adaptive And Explainable Visualization Recommendation For Tabular Data, Songheng Zhang, Yong Wang, Haotian Li, Huamin Qu
Research Collection School Of Computing and Information Systems
Automated visualization recommendation facilitates the rapid creation of effective visualizations, which is especially beneficial for users with limited time and limited knowledge of data visualization. There is an increasing trend in leveraging machine learning (ML) techniques to achieve an end-to-end visualization recommendation. However, existing ML-based approaches implicitly assume that there is only one appropriate visualization for a specific dataset, which is often not true for real applications. Also, they often work like a black box, and are difficult for users to understand the reasons for recommending specific visualizations. To fill the research gap, we propose AdaVis, an adaptive and explainable …
Gnnlens: A Visual Analytics Approach For Prediction Error Diagnosis Of Graph Neural Networks., Zhihua Jin, Yong Wang, Qianwen Wang, Yao Ming, Tengfei Ma, Huamin Qu
Gnnlens: A Visual Analytics Approach For Prediction Error Diagnosis Of Graph Neural Networks., Zhihua Jin, Yong Wang, Qianwen Wang, Yao Ming, Tengfei Ma, Huamin Qu
Research Collection School Of Computing and Information Systems
Graph Neural Networks (GNNs) aim to extend deep learning techniques to graph data and have achieved significant progress in graph analysis tasks (e.g., node classification) in recent years. However, similar to other deep neural networks like Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), GNNs behave like a black box with their details hidden from model developers and users. It is therefore difficult to diagnose possible errors of GNNs. Despite many visual analytics studies being done on CNNs and RNNs, little research has addressed the challenges for GNNs. This paper fills the research gap with an interactive visual analysis …
Chatgpt As Metamorphosis Designer For The Future Of Artificial Intelligence (Ai): A Conceptual Investigation, Amarjit Kumar Singh (Library Assistant), Dr. Pankaj Mathur (Deputy Librarian)
Chatgpt As Metamorphosis Designer For The Future Of Artificial Intelligence (Ai): A Conceptual Investigation, Amarjit Kumar Singh (Library Assistant), Dr. Pankaj Mathur (Deputy Librarian)
Library Philosophy and Practice (e-journal)
Abstract
Purpose: The purpose of this research paper is to explore ChatGPT’s potential as an innovative designer tool for the future development of artificial intelligence. Specifically, this conceptual investigation aims to analyze ChatGPT’s capabilities as a tool for designing and developing near about human intelligent systems for futuristic used and developed in the field of Artificial Intelligence (AI). Also with the helps of this paper, researchers are analyzed the strengths and weaknesses of ChatGPT as a tool, and identify possible areas for improvement in its development and implementation. This investigation focused on the various features and functions of ChatGPT that …
Daot: Domain-Agnostically Aligned Optimal Transport For Domain-Adaptive Crowd Counting, Huilin Zhu, Jingling Yuan, Xian Zhong, Zhengwei Yang, Zheng Wang, Shengfeng He
Daot: Domain-Agnostically Aligned Optimal Transport For Domain-Adaptive Crowd Counting, Huilin Zhu, Jingling Yuan, Xian Zhong, Zhengwei Yang, Zheng Wang, Shengfeng He
Research Collection School Of Computing and Information Systems
Domain adaptation is commonly employed in crowd counting to bridge the domain gaps between different datasets. However, existing domain adaptation methods tend to focus on inter-dataset differences while overlooking the intra-differences within the same dataset, leading to additional learning ambiguities. These domain-agnostic factors,e.g., density, surveillance perspective, and scale, can cause significant in-domain variations, and the misalignment of these factors across domains can lead to a drop in performance in cross-domain crowd counting. To address this issue, we propose a Domain-agnostically Aligned Optimal Transport (DAOT) strategy that aligns domain-agnostic factors between domains. The DAOT consists of three steps. First, individual-level differences …
Equivariance And Invariance Inductive Bias For Learning From Insufficient Data, Tan Wang, Qianru Sun, Sugiri Pranata, Karlekar Jayashree, Hanwang Zhang
Equivariance And Invariance Inductive Bias For Learning From Insufficient Data, Tan Wang, Qianru Sun, Sugiri Pranata, Karlekar Jayashree, Hanwang Zhang
Research Collection School Of Computing and Information Systems
We are interested in learning robust models from insufficient data, without the need for any externally pre-trained model checkpoints. First, compared to sufficient data, we show why insufficient data renders the model more easily biased to the limited training environments that are usually different from testing. For example, if all the training "swan" samples are "white", the model may wrongly use the "white" environment to represent the intrinsic class "swan". Then, we justify that equivariance inductive bias can retain the class feature while invariance inductive bias can remove the environmental feature, leaving only the class feature that generalizes to any …
A Large-Scale Benchmark For Food Image Segmentation, Xiongwei Wu, Xin Fu, Ying Liu, Ee-Peng Lim, Steven C. H. Hoi, Qianru Sun
A Large-Scale Benchmark For Food Image Segmentation, Xiongwei Wu, Xin Fu, Ying Liu, Ee-Peng Lim, Steven C. H. Hoi, Qianru Sun
Research Collection School Of Computing and Information Systems
Food image segmentation is a critical and indispensible task for developing health-related applications such as estimating food calories and nutrients. Existing food image segmentation models are underperforming due to two reasons: (1) there is a lack of high quality food image datasets with fine-grained ingredient labels and pixel-wise location masks—the existing datasets either carry coarse ingredient labels or are small in size; and (2) the complex appearance of food makes it difficult to localize and recognize ingredients in food images, e.g., the ingredients may overlap one another in the same image, and the identical ingredient may appear distinctly in different …
Delving Deep Into Many-To-Many Attention For Few-Shot Video Object Segmentation, Haoxin Chen, Hanjie Wu, Nanxuan Zhao, Sucheng Ren, Shengfeng He
Delving Deep Into Many-To-Many Attention For Few-Shot Video Object Segmentation, Haoxin Chen, Hanjie Wu, Nanxuan Zhao, Sucheng Ren, Shengfeng He
Research Collection School Of Computing and Information Systems
This paper tackles the task of Few-Shot Video Object Segmentation (FSVOS), i.e., segmenting objects in the query videos with certain class specified in a few labeled support images. The key is to model the relationship between the query videos and the support images for propagating the object information. This is a many-to-many problem and often relies on full-rank attention, which is computationally intensive. In this paper, we propose a novel Domain Agent Network (DAN), breaking down the full-rank attention into two smaller ones. We consider one single frame of the query video as the domain agent, bridging between the support …
Characterizing Students’ Engineering Design Strategies Using Energy3d, Jasmine Singh, Viranga Perera, Alejandra Magana, Brittany Newell
Characterizing Students’ Engineering Design Strategies Using Energy3d, Jasmine Singh, Viranga Perera, Alejandra Magana, Brittany Newell
Discovery Undergraduate Interdisciplinary Research Internship
The goals of this study are to characterize design actions that students performed when solving a design challenge, and to create a machine learning model to help future students make better engineering design choices. We analyze data from an introductory engineering course where students used Energy3D, an open source computer-aided design software, to design a zero-energy home (i.e. a home that consumes no net energy over a period of a year). Student design actions within the software were recorded into text files. Using a sample of over 300 students, we first identify patterns in the data to assess how students …
Espade: An Efficient And Semantically Secure Shortest Path Discovery For Outsourced Location-Based Services, Bharath K. Samanthula, Divyadharshini Karthikeyan, Boxiang Dong, K. Anitha Kumari
Espade: An Efficient And Semantically Secure Shortest Path Discovery For Outsourced Location-Based Services, Bharath K. Samanthula, Divyadharshini Karthikeyan, Boxiang Dong, K. Anitha Kumari
Department of Computer Science Faculty Scholarship and Creative Works
With the rapid growth of smart devices and technological advancements in tracking geospatial data, the demand for Location-Based Services (LBS) is facing a constant rise in several domains, including military, healthcare and transportation. It is a natural step to migrate LBS to a cloud environment to achieve on-demand scalability and increased resiliency. Nonetheless, outsourcing sensitive location data to a third-party cloud provider raises a host of privacy concerns as the data owners have reduced visibility and control over the outsourced data. In this paper, we consider outsourced LBS where users want to retrieve map directions without disclosing their location information. …
Storage Management Strategy In Mobile Phones For Photo Crowdsensing, En Wang, Zhengdao Qu, Xinyao Liang, Xiangyu Meng, Yongjian Yang, Dawei Li, Weibin Meng
Storage Management Strategy In Mobile Phones For Photo Crowdsensing, En Wang, Zhengdao Qu, Xinyao Liang, Xiangyu Meng, Yongjian Yang, Dawei Li, Weibin Meng
Department of Computer Science Faculty Scholarship and Creative Works
In mobile crowdsensing, some users jointly finish a sensing task through the sensors equipped in their intelligent terminals. In particular, the photo crowdsensing based on Mobile Edge Computing (MEC) collects pictures for some specific targets or events and uploads them to nearby edge servers, which leads to richer data content and more efficient data storage compared with the common mobile crowdsensing; hence, it has attracted an important amount of attention recently. However, the mobile users prefer uploading the photos through Wifi APs (PoIs) rather than cellular networks. Therefore, photos stored in mobile phones are exchanged among users, in order to …
Gender And Racial Diversity In Commercial Brands' Advertising Images On Social Media, Jisun An, Haewoon Kwak
Gender And Racial Diversity In Commercial Brands' Advertising Images On Social Media, Jisun An, Haewoon Kwak
Research Collection School Of Computing and Information Systems
Gender and racial diversity in the mediated images from the media shape our perception of different demographic groups. In this work, we investigate gender and racial diversity of 85,957 advertising images shared by the 73 top international brands on Instagram and Facebook. We hope that our analyses give guidelines on how to build a fully automated watchdog for gender and racial diversity in online advertisements.
Ancr—An Adaptive Network Coding Routing Scheme For Wsns With Different-Success-Rate Links †, Xiang Ji, Anwen Wang, Chunyu Li, Chun Ma, Yao Peng, Dajin Wang, Qingyi Hua, Feng Chen, Dingyi Fang
Ancr—An Adaptive Network Coding Routing Scheme For Wsns With Different-Success-Rate Links †, Xiang Ji, Anwen Wang, Chunyu Li, Chun Ma, Yao Peng, Dajin Wang, Qingyi Hua, Feng Chen, Dingyi Fang
Department of Computer Science Faculty Scholarship and Creative Works
As the underlying infrastructure of the Internet of Things (IoT), wireless sensor networks (WSNs) have been widely used in many applications. Network coding is a technique in WSNs to combine multiple channels of data in one transmission, wherever possible, to save node’s energy as well as increase the network throughput. So far most works on network coding are based on two assumptions to determine coding opportunities: (1) All the links in the network have the same transmission success rate; (2) Each link is bidirectional, and has the same transmission success rate on both ways. However, these assumptions may not be …
Spica: Stereographic Projection For Interactive Crystallographic Analysis, Xingzhong Li
Spica: Stereographic Projection For Interactive Crystallographic Analysis, Xingzhong Li
Nebraska Center for Materials and Nanoscience: Faculty Publications
In numerous research fields, especially the applications of electron and X-ray diffraction, stereographic projection represents a powerful tool for researchers. SPICA is a new computer program for stereographic projection in interactive crystallographic analysis, which inherits features from the previous JECP/SP and includes more functions for extensive crystallographic analysis. SPICA provides fully interactive options for users to plot stereograms of crystal directions and crystal planes, traces, and Kikuchi maps for an arbitrary crystal structure; it can be used to explore the orientation relationships between two crystalline phases with a composite stereogram; it is also used to predict the tilt angles of …
An Immersive Telepresence System Using Rgb-D Sensors And Head-Mounted Display, Xinzhong Lu, Ju Shen, Saverio Perugini, Jianjun Yang
An Immersive Telepresence System Using Rgb-D Sensors And Head-Mounted Display, Xinzhong Lu, Ju Shen, Saverio Perugini, Jianjun Yang
Computer Science Faculty Publications
We present a tele-immersive system that enables people to interact with each other in a virtual world using body gestures in addition to verbal communication. Beyond the obvious applications, including general online conversations and gaming, we hypothesize that our proposed system would be particularly beneficial to education by offering rich visual contents and interactivity. One distinct feature is the integration of egocentric pose recognition that allows participants to use their gestures to demonstrate and manipulate virtual objects simultaneously. This functionality enables the instructor to effectively and efficiently explain and illustrate complex concepts or sophisticated problems in an intuitive manner. The …
Automatic Video Self Modeling For Voice Disorder, Ju Shen, Changpeng Ti, Anusha Raghunathan, Sen-Ching S. Cheung, Rita Patel
Automatic Video Self Modeling For Voice Disorder, Ju Shen, Changpeng Ti, Anusha Raghunathan, Sen-Ching S. Cheung, Rita Patel
Computer Science Faculty Publications
Video self modeling (VSM) is a behavioral intervention technique in which a learner models a target behavior by watching a video of him- or herself. In the field of speech language pathology, the approach of VSM has been successfully used for treatment of language in children with Autism and in individuals with fluency disorder of stuttering. Technical challenges remain in creating VSM contents that depict previously unseen behaviors. In this paper, we propose a novel system that synthesizes new video sequences for VSM treatment of patients with voice disorders. Starting with a video recording of a voice-disorder patient, the proposed …
Compression Of Video Tracking And Bandwidth Balancing Routing In Wireless Multimedia Sensor Networks, Yin Wang, Jianjun Yang, Ju Shen, Bryson Payne, Juan Guo, Kun Hua
Compression Of Video Tracking And Bandwidth Balancing Routing In Wireless Multimedia Sensor Networks, Yin Wang, Jianjun Yang, Ju Shen, Bryson Payne, Juan Guo, Kun Hua
Computer Science Faculty Publications
There has been a tremendous growth in multimedia applications over wireless networks. Wireless Multimedia Sensor Networks(WMSNs) have become the premier choice in many research communities and industry. Many state-of-art applications, such as surveillance, traffic monitoring, and remote heath care are essentially video tracking and transmission in WMSNs. The transmission speed is constrained by the big file size of video data and fixed bandwidth allocation in constant routing paths. In this paper, we present a CamShift based algorithm to compress the tracking of videos. Then we propose a bandwidth balancing strategy in which each sensor node is able to dynamically select …
Leading Undergraduate Students To Big Data Generation, Jianjun Yang, Ju Shen
Leading Undergraduate Students To Big Data Generation, Jianjun Yang, Ju Shen
Computer Science Faculty Publications
People are facing a flood of data today. Data are being collected at unprecedented scale in many areas, such as networking, image processing, virtualization, scientific computation, and algorithms. The huge data nowadays are called Big Data. Big data is an all encompassing term for any collection of data sets so large and complex that it becomes difficult to process them using traditional data processing applications. In this article, the authors present a unique way which uses network simulator and tools of image processing to train students abilities to learn, analyze, manipulate, and apply Big Data. Thus they develop students hands-on …
Person Identification From Streaming Surveillance Video Using Mid-Level Features From Joint Action-Pose Distribution, Binu M. Nair, Vijayan K. Asari
Person Identification From Streaming Surveillance Video Using Mid-Level Features From Joint Action-Pose Distribution, Binu M. Nair, Vijayan K. Asari
Electrical and Computer Engineering Faculty Publications
We propose a real time person identification algorithm for surveillance based scenarios from low-resolution streaming video, based on mid-level features extracted from the joint distribution of various types of human actions and human poses.
The proposed algorithm uses the combination of an auto-encoder based action association framework which produces per-frame probability estimates of the action being performed, and a pose recognition framework which gives per-frame body part locations.
The main focus in this manuscript is to effectively combine these per-frame action probability estimates and pose trajectories from a short temporal window to obtain mid-level features. We demonstrate that these mid-level …
Hole Detection And Shape-Free Representation And Double Landmarks Based Geographic Routing In Wireless Sensor Networks, Jianjun Yang, Zongming Fei, Ju Shen
Hole Detection And Shape-Free Representation And Double Landmarks Based Geographic Routing In Wireless Sensor Networks, Jianjun Yang, Zongming Fei, Ju Shen
Computer Science Faculty Publications
In wireless sensor networks, an important issue of geographic routing is “local minimum” problem, which is caused by a “hole” that blocks the greedy forwarding process. Existing geographic routing algorithms use perimeter routing strategies to find a long detour path when such a situation occurs. To avoid the long detour path, recent research focuses on detecting the hole in advance, then the nodes located on the boundary of the hole advertise the hole information to the nodes near the hole. Hence the long detour path can be avoided in future routing. We propose a heuristic hole detecting algorithm which identifies …
Seeing Human Weight From A Single Rgb-D Image, Tam Nguyen, Jiashi Feng, Shuicheng Yan
Seeing Human Weight From A Single Rgb-D Image, Tam Nguyen, Jiashi Feng, Shuicheng Yan
Computer Science Faculty Publications
Human weight estimation is useful in a variety of potential applications, e.g., targeted advertisement, entertainment scenarios and forensic science. However, estimating weight only from color cues is particularly challenging since these cues are quite sensitive to lighting and imaging conditions. In this article, we propose a novel weight estimator based on a single RGB-D image, which utilizes the visual color cues and depth information. Our main contributions are three-fold.
First, we construct the W8-RGBD dataset including RGB-D images of different people with ground truth weight.
Second, the novel sideview shape feature and the feature fusion model are proposed to facilitate …
Structure Preserving Large Imagery Reconstruction, Ju Shen, Jianjun Yang, Sami Taha Abu Sneineh, Bryson Payne, Markus Hitz
Structure Preserving Large Imagery Reconstruction, Ju Shen, Jianjun Yang, Sami Taha Abu Sneineh, Bryson Payne, Markus Hitz
Computer Science Faculty Publications
With the explosive growth of web-based cameras and mobile devices, billions of photographs are uploaded to the internet. We can trivially collect a huge number of photo streams for various goals, such as image clustering, 3D scene reconstruction, and other big data applications. However, such tasks are not easy due to the fact the retrieved photos can have large variations in their view perspectives, resolutions, lighting, noises, and distortions. Furthermore, with the occlusion of unexpected objects like people, vehicles, it is even more challenging to find feature correspondences and reconstruct realistic scenes. In this paper, we propose a structure-based image …
Automatic Objects Removal For Scene Completion, Jianjun Yang, Yin Wang, Honggang Wang, Kun Hua, Wei Wang, Ju Shen
Automatic Objects Removal For Scene Completion, Jianjun Yang, Yin Wang, Honggang Wang, Kun Hua, Wei Wang, Ju Shen
Computer Science Faculty Publications
With the explosive growth of Web-based cameras and mobile devices, billions of photographs are uploaded to the Internet. We can trivially collect a huge number of photo streams for various goals, such as 3D scene reconstruction and other big data applications. However, this is not an easy task due to the fact the retrieved photos are neither aligned nor calibrated. Furthermore, with the occlusion of unexpected foreground objects like people, vehicles, it is even more challenging to find feature correspondences and reconstruct realistic scenes. In this paper, we propose a structure-based image completion algorithm for object removal that produces visually …