Open Access. Powered by Scholars. Published by Universities.®

Object detection

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 35

Full-Text Articles in Artificial Intelligence and Robotics

Scene Graph Generation: A Comprehensive Survey, Hongsheng Li, Guangming Zhu, Liang Zhang, Youliang Jiang, Yixuan Dang, Haoran Hou, Peiyi Shen, Xia Zhao, Syed A. A. Shah, Mohammed Bennamoun Jan 2024

Scene Graph Generation: A Comprehensive Survey, Hongsheng Li, Guangming Zhu, Liang Zhang, Youliang Jiang, Yixuan Dang, Haoran Hou, Peiyi Shen, Xia Zhao, Syed A. A. Shah, Mohammed Bennamoun

Research outputs 2022 to 2026

Deep learning techniques have led to remarkable breakthroughs in the field of object detection and have spawned a lot of scene-understanding tasks in recent years. Scene graph has been the focus of research because of its powerful semantic representation and applications to scene understanding. Scene Graph Generation (SGG) refers to the task of automatically mapping an image or a video into a semantic structural scene graph, which requires the correct labeling of detected objects and their relationships. In this paper, a comprehensive survey of recent achievements is provided. This survey attempts to connect and systematize the existing visual relationship detection …


Uavs And Deep Neural Networks: An Alternative Approach To Monitoring Waterfowl At The Site Level, Zachary J. Loken Nov 2023

Uavs And Deep Neural Networks: An Alternative Approach To Monitoring Waterfowl At The Site Level, Zachary J. Loken

LSU Master's Theses

Understanding how waterfowl respond to habitat restoration and management activities is crucial for evaluating and refining conservation delivery programs. However, site-specific waterfowl monitoring is challenging, especially in heavily forested systems such as the Mississippi Alluvial Valley (MAV)—a primary wintering region for ducks in North America. I hypothesized that using uncrewed aerial vehicles (UAVs) coupled with deep learning-based methods for object detection would provide an efficient and effective means for surveying non-breeding waterfowl on difficult-to-access restored wetland sites. Accordingly, during the winters of 2021 and 2022, I surveyed wetland restoration easements in the MAV using a UAV equipped with a dual …


Modularized Zero-Shot Vqa With Pre-Trained Models, Rui Cao, Jing Jiang Jul 2023

Modularized Zero-Shot Vqa With Pre-Trained Models, Rui Cao, Jing Jiang

Research Collection School Of Computing and Information Systems

Large-scale pre-trained models (PTMs) show great zero-shot capabilities. In this paper, we study how to leverage them for zero-shot visual question answering (VQA).Our approach is motivated by a few observations. First, VQA questions often require multiple steps of reasoning, which is still a capability that most PTMs lack. Second, different steps in VQA reasoning chains require different skills such as object detection and relational reasoning, but a single PTM may not possess all these skills. Third, recent work on zero-shot VQA does not explicitly consider multi-step reasoning chains, which makes them less interpretable compared with a decomposition-based approach. We propose …


An Approach To Lunar Regolith Particle Detection And Classification Using Deep Learning, Hira Nadeem Apr 2023

An Approach To Lunar Regolith Particle Detection And Classification Using Deep Learning, Hira Nadeem

Electronic Thesis and Dissertation Repository

Lunar regolith, unconsolidated rock on the lunar surface, is made up of various particles. Understanding the quantities and locations of these particles on the lunar surface is of particular interest to planetary scientists for mission planning and science objectives. There is a limited supply of lunar regolith samples available on Earth for planetary scientists to characterize. Lunar rover missions over the next decade are expected to provide high-resolution images of the lunar surface. Deep learning can be leveraged to analyze lunar regolith from image data. An object detection model using transfer learning was developed to identify and classify particles of …


A Multistage Framework For Detection Of Very Small Objects, Duleep Rathgamage Don, Ramazan Aygun, Mahmut Karakaya Jan 2023

A Multistage Framework For Detection Of Very Small Objects, Duleep Rathgamage Don, Ramazan Aygun, Mahmut Karakaya

Published and Grey Literature from PhD Candidates

Small object detection is one of the most challenging problems in computer vision. Algorithms based on state-of-the-art object detection methods such as R-CNN, SSD, FPN, and YOLO fail to detect objects of very small sizes. In this study, we propose a novel method to detect very small objects, smaller than 8×8 pixels, that appear in a complex background. The proposed method is a multistage framework consisting of an unsupervised algorithm and three separately trained supervised algorithms. The unsupervised algorithm extracts ROIs from a high-resolution image. Then the ROIs are upsampled using SRGAN, and the enhanced ROIs are detected by our …


Simulation Research On Appearance Detection Of Ampoules Based On Lightweight Network And Model Compression, Zhihao Zhu, Yan Wang, Zhicheng Ji Dec 2022

Simulation Research On Appearance Detection Of Ampoules Based On Lightweight Network And Model Compression, Zhihao Zhu, Yan Wang, Zhicheng Ji

Journal of System Simulation

Abstract: Aiming at the large scale and redundant parameters of target detection network model, which result in the difficult to deploy the ampoule bottle appearance defect detection model to edge devices, an LC-Faster R-CNN defect detection algorithm based on lightweight network and model compression is proposed. MobileNet-V2 is used as the backbone, and the redundant channels in the convolutional network are trimmed by model pruning strategy. The floating-point parameters are quantized into integers through saturation truncation mapping. Knowledge distillation is used to restore the accuracy of the compressed network. Tested on the self-built ampoule appearance defect dataset, the model volume …


Physical Adversarial Attack On A Robotic Arm, Yifan Jia, Christopher M. Poskitt, Jun Sun, Sudipta Chattopadhyay Oct 2022

Physical Adversarial Attack On A Robotic Arm, Yifan Jia, Christopher M. Poskitt, Jun Sun, Sudipta Chattopadhyay

Research Collection School Of Computing and Information Systems

Collaborative Robots (cobots) are regarded as highly safety-critical cyber-physical systems (CPSs) owing to their close physical interactions with humans. In settings such as smart factories, they are frequently augmented with AI. For example, in order to move materials, cobots utilize object detectors based on deep learning models. Deep learning, however, has been demonstrated as vulnerable to adversarial attacks: a minor change (noise) to benign input can fool the underlying neural networks and lead to a different result. While existing works have explored such attacks in the context of picture/object classification, less attention has been given to attacking neural networks used …


Cvfnet: Real-Time 3d Object Detection By Learning Cross View Features, Jiaqi Gu, Zhiyu Xiang, Pan Zhao, Tingming Bai, Lingxuan Wang, Xijun Zhao, Zhiyuan Zhang Oct 2022

Cvfnet: Real-Time 3d Object Detection By Learning Cross View Features, Jiaqi Gu, Zhiyu Xiang, Pan Zhao, Tingming Bai, Lingxuan Wang, Xijun Zhao, Zhiyuan Zhang

Research Collection School Of Computing and Information Systems

In recent years 3D object detection from LiDAR point clouds has made great progress thanks to the development of deep learning technologies. Although voxel or point based methods are popular in 3D object detection, they usually involve time-consuming operations such as 3D convolutions on voxels or ball query among points, making the resulting network inappropriate for time critical applications. On the other hand, 2D view-based methods feature high computing efficiency while usually obtaining inferior performance than the voxel or point based methods. In this work, we present a real-time view-based single stage 3D object detector, namely CVFNet to fulfill this …


Avist: A Benchmark For Visual Object Tracking In Adverse Visibility, Mubashir Noman, Wafa Al Ghallabi, Daniya Najiha, Christoph Mayer, Hisham Cholakkal, Salman Khan, Luc Van Gool, Fahad Shahbaz Khan Aug 2022

Avist: A Benchmark For Visual Object Tracking In Adverse Visibility, Mubashir Noman, Wafa Al Ghallabi, Daniya Najiha, Christoph Mayer, Hisham Cholakkal, Salman Khan, Luc Van Gool, Fahad Shahbaz Khan

Computer Vision Faculty Publications

One of the key factors behind the recent success in visual tracking is the availability of dedicated benchmarks. While being greatly benefiting to the tracking research, existing benchmarks do not pose the same difficulty as before with recent trackers achieving higher performance mainly due to (i) the introduction of more sophisticated transformers-based methods and (ii) the lack of diverse scenarios with adverse visibility such as, severe weather conditions, camouflage and imaging effects. We introduce AVisT, a dedicated benchmark for visual tracking in diverse scenarios with adverse visibility. AVisT comprises 120 challenging sequences with 80k annotated frames, spanning 18 diverse scenarios …


Adversarial Pixel Restoration As A Pretext Task For Transferable Perturbations, Hashmat Shadab Malik, Shahina K. Kunhimon, Muzammal Nasser, Salman Khan, Fahad Shahbaz Khan Jul 2022

Adversarial Pixel Restoration As A Pretext Task For Transferable Perturbations, Hashmat Shadab Malik, Shahina K. Kunhimon, Muzammal Nasser, Salman Khan, Fahad Shahbaz Khan

Computer Vision Faculty Publications

Transferable adversarial attacks optimize adversaries from a pretrained surrogate model and known label space to fool the unknown black-box models. Therefore, these attacks are restricted by the availability of an effective surrogate model. In this work, we relax this assumption and propose Adversarial Pixel Restoration as a self-supervised alternative to train an effective surrogate model from scratch under the condition of no labels and few data samples. Our training approach is based on a min-max objective which reduces overfitting via an adversarial objective and thus optimizes for a more generalizable surrogate model. Our proposed attack is complimentary to our adversarial …


Bridging The Gap Between Object And Image-Level Representations For Open-Vocabulary Detection, Hanoona Rasheed, Muhammad Maaz, Muhammad Uzair Khattak, Salman Khan, Fahad Shahbaz Khan Jul 2022

Bridging The Gap Between Object And Image-Level Representations For Open-Vocabulary Detection, Hanoona Rasheed, Muhammad Maaz, Muhammad Uzair Khattak, Salman Khan, Fahad Shahbaz Khan

Computer Vision Faculty Publications

Existing open-vocabulary object detectors typically enlarge their vocabulary sizes by leveraging different forms of weak supervision. This helps generalize to novel objects at inference. Two popular forms of weak-supervision used in open-vocabulary detection (OVD) include pretrained CLIP model and image-level supervision. We note that both these modes of supervision are not optimally aligned for the detection task: CLIP is trained with image-text pairs and lacks precise localization of objects while the image-level supervision has been used with heuristics that do not accurately specify local object regions. In this work, we propose to address this problem by performing object-centric alignment of …


Incorporating Spatial Relationship Information In Signal-To-Text Processing, Jeremy Elon Davis May 2022

Incorporating Spatial Relationship Information In Signal-To-Text Processing, Jeremy Elon Davis

Theses and Dissertations

This dissertation outlines the development of a signal-to-text system that incorporates spatial relationship information to generate scene descriptions. Existing signal-to-text systems generate accurate descriptions in regards to information contained in an image. However, to date, no signalto- text system incorporates spatial relationship information. A survey of related work in the fields of object detection, signal-to-text, and spatial relationships in images is presented first. Three methodologies followed by evaluations were conducted in order to create the signal-to-text system: 1) generation of object localization results from a set of input images, 2) derivation of Level One Summaries from an input image, and …


Energy-Based Latent Aligner For Incremental Learning, K.J. Joseph, Salman Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Vineeth N. Balasubramanian Mar 2022

Energy-Based Latent Aligner For Incremental Learning, K.J. Joseph, Salman Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Vineeth N. Balasubramanian

Computer Vision Faculty Publications

Deep learning models tend to forget their earlier knowledge while incrementally learning new tasks. This behavior emerges because the parameter updates optimized for the new tasks may not align well with the updates suitable for older tasks. The resulting latent representation mismatch causes forgetting. In this work, we propose ELI: Energy-based Latent Aligner for Incremental Learning, which first learns an energy manifold for the latent representations such that previous task latents will have low energy and the current task latents have high energy values. This learned manifold is used to counter the representational shift that happens during incremental learning. The …


Pseudo-Stereo For Monocular 3d Object Detection In Autonomous Driving, Yi-Nan Chen, Hang Dai, Yong Ding Mar 2022

Pseudo-Stereo For Monocular 3d Object Detection In Autonomous Driving, Yi-Nan Chen, Hang Dai, Yong Ding

Computer Vision Faculty Publications

Pseudo-LiDAR 3D detectors have made remarkable progress in monocular 3D detection by enhancing the capability of perceiving depth with depth estimation networks, and using LiDAR-based 3D detection architectures. The advanced stereo 3D detectors can also accurately localize 3D objects. The gap in image-to-image generation for stereo views is much smaller than that in image-to-LiDAR generation. Motivated by this, we propose a Pseudo-Stereo 3D detection framework with three novel virtual view generation methods, including image-level generation, feature-level generation, and feature-clone, for detecting 3D objects from a single image. Our analysis of depth-aware learning shows that the depth loss is effective in …


Transformers In Vision: A Survey, Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, Mubarak Shah Jan 2022

Transformers In Vision: A Survey, Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, Mubarak Shah

Computer Vision Faculty Publications

Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems. Among their salient benefits, Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence as compared to recurrent networks e.g., Long short-term memory (LSTM). Different from convolutional networks, Transformers require minimal inductive biases for their design and are naturally suited as set-functions. Furthermore, the straightforward design of Transformers allows processing multiple modalities (e.g., images, videos, text and speech) using similar processing blocks and demonstrates excellent scalability to very large capacity networks and huge …


Synthetic Augmentation Methods For Object Detection In Overhead Imagery, Nicholas R. Hamilton Jan 2022

Synthetic Augmentation Methods For Object Detection In Overhead Imagery, Nicholas R. Hamilton

Dissertations, Master's Theses and Master's Reports

The multidisciplinary area of geospatial intelligence (GEOINT) is continually changing and becoming more complex. From efforts to automate portions of GEOINT using machine learning, which augment the analyst and improve exploitation, to optimizing the growing number of sources and variables, there is no denying that the strategies involved in this collection method are rapidly progressing. The unique and inherent complexities involved in imagery analysis from an overhead perspective--—e.g., target resolution, imaging band(s), and imaging angle--—test the ability of even the most developed and novel machine learning techniques. To support advancement in the application of object detection in overhead imagery, we …


Ow-Detr: Open-World Detection Transformer, Akshita Gupta, Sanath Narayan, K.J. Joseph, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah Dec 2021

Ow-Detr: Open-World Detection Transformer, Akshita Gupta, Sanath Narayan, K.J. Joseph, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

Computer Vision Faculty Publications

Open-world object detection (OWOD) is a challenging computer vision problem, where the task is to detect a known set of object categories while simultaneously identifying unknown objects. Additionally, the model must incrementally learn new classes that become known in the next training episodes. Distinct from standard object detection, the OWOD setting poses significant challenges for generating quality candidate proposals on potentially unknown objects, separating the unknown objects from the background and detecting diverse unknown objects. Here, we introduce a novel end-to-end transformer-based framework, OW-DETR, for open-world object detection. The proposed OW-DETR comprises three dedicated components namely, attention-driven pseudo-labeling, novelty classification …


Multi-Modal Transformers Excel At Class-Agnostic Object Detection, Muhammad Maaz, Hanoona Bangalath Rasheed, Salman Hameed Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Ming-Hsuan Yang Nov 2021

Multi-Modal Transformers Excel At Class-Agnostic Object Detection, Muhammad Maaz, Hanoona Bangalath Rasheed, Salman Hameed Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Ming-Hsuan Yang

Computer Vision Faculty Publications

What constitutes an object? This has been a longstanding question in computer vision. Towards this goal, numerous learning-free and learning-based approaches have been developed to score objectness. However, they generally do not scale well across new domains and for unseen objects. In this paper, we advocate that existing methods lack a top-down supervision signal governed by human-understandable semantics. To bridge this gap, we explore recent Multi-modal Vision Transformers (MViT) that have been trained with aligned image-text pairs. Our extensive experiments across various domains and novel objects show the state-of-the-art performance of MViTs to localize generic objects in images. Based on …


Methods For Detecting Floodwater On Roadways From Ground Level Images, Cem Sazara Jul 2021

Methods For Detecting Floodwater On Roadways From Ground Level Images, Cem Sazara

Computational Modeling & Simulation Engineering Theses & Dissertations

Recent research and statistics show that the frequency of flooding in the world has been increasing and impacting flood-prone communities severely. This natural disaster causes significant damages to human life and properties, inundates roads, overwhelms drainage systems, and disrupts essential services and economic activities. The focus of this dissertation is to use machine learning methods to automatically detect floodwater in images from ground level in support of the frequently impacted communities. The ground level images can be retrieved from multiple sources, including the ones that are taken by mobile phone cameras as communities record the state of their flooded streets. …


Towards Open World Object Detection, K. J. Joseph, Salman Khan, Fahad Shahbaz Khan, Vineeth N. Balasubramanian May 2021

Towards Open World Object Detection, K. J. Joseph, Salman Khan, Fahad Shahbaz Khan, Vineeth N. Balasubramanian

Computer Vision Faculty Publications

Humans have a natural instinct to identify unknown object instances in their environments. The intrinsic curiosity about these unknown instances aids in learning about them, when the corresponding knowledge is eventually available. This motivates us to propose a novel computer vision problem called: 'Open World Object Detection', where a model is tasked to: 1) identify objects that have not been introduced to it as 'unknown', without explicit supervision to do so, and 2) incrementally learn these identified unknown categories without forgetting previously learned classes, when the corresponding labels are progressively received. We formulate the problem, introduce a strong evaluation protocol …


A Deep Learning-Based Automatic Object Detection Method For Autonomous Driving Ships, Ojonoka Erika Atawodi May 2021

A Deep Learning-Based Automatic Object Detection Method For Autonomous Driving Ships, Ojonoka Erika Atawodi

Master's Theses

An important feature of an Autonomous Surface Vehicles (ASV) is its capability of automatic object detection to avoid collisions, obstacles and navigate on their own.

Deep learning has made some significant headway in solving fundamental challenges associated with object detection and computer vision. With tremendous demand and advancement in the technologies associated with ASVs, a growing interest in applying deep learning techniques in handling challenges pertaining to autonomous ship driving has substantially increased over the years.

In this thesis, we study, design, and implement an object recognition framework that detects and recognizes objects found in the sea. We first curated …


Object Detection In Rgb-D Image Based On Annet, Cai Qiang, Liwei Wei, Haisheng Li, Cao Jian Aug 2020

Object Detection In Rgb-D Image Based On Annet, Cai Qiang, Liwei Wei, Haisheng Li, Cao Jian

Journal of System Simulation

Abstract: The wide spread of depth images acquisition devices makes object detection in RGB-D images a hotspot in the field of computer vision. In order to make the features extracted by CNN more robust and to improve the detection accuracy, an improved CNN called ANNet was designed. To enhance the model discriminability of local patches within the receptive field, some linear convolutional layers in the AlexNet with nonlinear convolutional layers were replaced which contained multilayer perceptron against the linear feature between convolution filter and underlying data patch. The experiment result shows that the detection accuracy is improved by 3% in …


Feature Pyramid Transformer, Dong Zhang, Hanwang Zhang, Jinhui Tang, Meng Wang, Xian-Sheng Hua, Qianru Sun Aug 2020

Feature Pyramid Transformer, Dong Zhang, Hanwang Zhang, Jinhui Tang, Meng Wang, Xian-Sheng Hua, Qianru Sun

Research Collection School Of Computing and Information Systems

Feature interactions across space and scales underpin modern visual recognition systems because they introduce beneficial visual contexts. Conventionally, spatial contexts are passively hidden in the CNN’s increasing receptive fields or actively encoded by non-local convolution. Yet, the non-local spatial interactions are not across scales, and thus they fail to capture the non-local contexts of objects (or parts) residing in different scales. To this end, we propose a fully active feature interaction across both space and scales, called Feature Pyramid Transformer (FPT). It transforms any feature pyramid into another feature pyramid of the same size but with richer contexts, by using …


Improving Visual Recognition With Unlabeled Data, Aruni Roy Chowdhury Jul 2020

Improving Visual Recognition With Unlabeled Data, Aruni Roy Chowdhury

Doctoral Dissertations

The success of deep neural networks has resulted in computer vision systems that obtain high accuracy on a wide variety of tasks such as image classification, object detection, semantic segmentation, etc. However, most state-of-the-art vision systems are dependent upon large amounts of labeled training data, which is not a scalable solution in the long run. This work focuses on improving existing models for visual object recognition and detection without being dependent on such large-scale human-annotated data. We first show how large numbers of hard examples (cases where an existing model makes a mistake) can be obtained automatically from unlabeled video …


Automated Digit Recognition On Sound Pressure Level Meters Based On Deep Learning, Che-Wei Tung May 2020

Automated Digit Recognition On Sound Pressure Level Meters Based On Deep Learning, Che-Wei Tung

Theses and Dissertations

Sound pressure level (SPL) meter is one of the useful devices used for measuring the sound level pressure. The measurement device displays the SPL value in decibels (dB) on a standard LCD screen (no backlight). We could base on the digit number shown on the LCD screen to do some adjustments or evaluations. Thus, SPL has been widely used in several fields to quantify different noise, such as industrial, environmental, and aircraft noise. However, in my basic knowledge, there is no previous study used machine learning to auto-recognize the digit on the SPL meter. This thesis presents a novel system …


Improving Face Clustering In Videos, Souyoung Jin Mar 2020

Improving Face Clustering In Videos, Souyoung Jin

Doctoral Dissertations

Human faces represent not only a challenging recognition problem for computer vision, but are also an important source of information about identity, intent, and state of mind. These properties make the analysis of faces important not just as algorithmic challenges, but as a gateway to developing computer vision methods that can better follow the intent and goals of human beings. In this thesis, we are interested in face clustering in videos. Given a raw video, with no caption or annotation, we want to group all detected faces by their identity. We address three problems in the area of face clustering …


Any-Shot Object Detection, Shafin Rahman, Salman Khan, Nick Barnes, Fahad Shahbaz Khan Mar 2020

Any-Shot Object Detection, Shafin Rahman, Salman Khan, Nick Barnes, Fahad Shahbaz Khan

Computer Vision Faculty Publications

Previous work on novel object detection considers zero or few-shot settings where none or few examples of each category are available for training. In real world scenarios, it is less practical to expect that ‘all’ the novel classes are either unseen or have few-examples. Here, we propose a more realistic setting termed ‘Any-shot detection’, where totally unseen and few-shot categories can simultaneously co-occur during inference. Any-shot detection offers unique challenges compared to conventional novel object detection such as, a high imbalance between unseen, few-shot and seen object classes, susceptibility to forget base-training while learning novel classes and distinguishing novel classes …


Computer Vision Gesture Recognition For Rock Paper Scissors, Nicholas Hunter Jan 2020

Computer Vision Gesture Recognition For Rock Paper Scissors, Nicholas Hunter

Senior Independent Study Theses

This project implements a human versus computer game of rock-paper-scissors using machine learning and computer vision. Player’s hand gestures are detected using single images with the YOLOv3 object detection system. This provides a generalized detection method which can recognize player moves without the need for a special background or lighting setup. Additionally, past moves are examined in context to predict the most probable next move of the system’s opponent. In this way, the system achieves higher win rates against human opponents than by using a purely random strategy.


Generating Expensive Relationship Features From Cheap Objects, Xiaogang Wang, Qianru Sun, Tat-Seng Chua, Marcelo Ang Sep 2019

Generating Expensive Relationship Features From Cheap Objects, Xiaogang Wang, Qianru Sun, Tat-Seng Chua, Marcelo Ang

Research Collection School Of Computing and Information Systems

We investigate the problem of object relationship classification of visual scenes. For a relationship object1-predicate-object2 that captures the object interaction, its representation is composed by the combination of object1 and object2 features. As a result, relationship classification models usually bias to the frequent objects, leading to poor generalization to rare or unseen objects. Inspired by the data augmentation methods, we propose a novel Semantic Transform Generative Adversarial Network (ST-GAN) that synthesizes relationship features for rare objects, conditioned on the features from random instances of the objects. Specifically, ST-GAN essentially offers a semantic transform function from cheap object features to expensive …


Robust Lightweight Object Detection, Siddharth Kumar May 2019

Robust Lightweight Object Detection, Siddharth Kumar

Master's Projects

Object detection is a very challenging problem in computer vision and has been a prominent subject of research for nearly three decades. There has been a promising in- crease in the accuracy and performance of object detectors ever since deep convolutional networks (CNN) were introduced. CNNs can be trained on large datasets made of high resolution images without flattening them, thereby using the spatial information. Their superior learning ability also makes them ideal for image classification and object de- tection tasks. Unfortunately, this power comes at the big cost of compute and memory. For instance, the Faster R-CNN detector required …