Neuroscience and Neurobiology | Open Access Articles

Predicting Fixations From Deep And Low-Level Features, Matthias Kümmerer, Thomas S.A. Wallis, Leon A. Gatys, Matthias Bethge

MODVIS Workshop

Learning what properties of an image are associated with human gaze placement is important both for understanding how biological systems explore the environment and for computer vision applications. Recent advances in deep learning for the first time enable us to explain a significant portion of the information expressed in the spatial fixation structure. Our saliency model DeepGaze II uses the VGG network (trained on object recognition in the ImageNet challenge) to convert an image into a high-dimensional feature space which is then readout by a second very simple network to yield a density prediction. DeepGaze II is right now the …

Go to article

How Deep Is The Feature Analysis Underlying Rapid Visual Categorization?, Sven Eberhardt, Jonah Cader, Thomas Serre

MODVIS Workshop

Rapid categorization paradigms have a long history in experimental psychology: Characterized by short presentation times and fast behavioral responses, these tasks highlight both the speed and ease with which our visual system processes natural object categories. Previous studies have shown that feed-forward hierarchical models of the visual cortex provide a good fit to human visual decisions. At the same time, recent work has demonstrated significant gains in object recognition accuracy with increasingly deep hierarchical architectures: From AlexNet to VGG to Microsoft CNTK – the field of computer vision has championed both depth and accuracy. But it is unclear how well …

Go to article

Using Deep Features To Predict Where People Look, Matthias Kümmerer, Matthias Bethge

MODVIS Workshop

When free-viewing scenes, the first few fixations of human observers are driven in part by bottom-up attention. We seek to characterize this process by extracting all information from images that can be used to predict fixation densities (Kuemmerer et al, PNAS, 2015). If we ignore time and observer identity, the average amount of information is slightly larger than 2 bits per image for the MIT 1003 dataset. The minimum amount of information is 0.3 bits and the maximum 5.2 bits. Before the rise of deep neural networks the best models were able to capture 1/3 of this information on average. …

Go to article

Neuroscience and Neurobiology Commons^™

Full-Text Articles in Neuroscience and Neurobiology

Predicting Fixations From Deep And Low-Level Features, Matthias Kümmerer, Thomas S.A. Wallis, Leon A. Gatys, Matthias Bethge

MODVIS Workshop

How Deep Is The Feature Analysis Underlying Rapid Visual Categorization?, Sven Eberhardt, Jonah Cader, Thomas Serre

MODVIS Workshop

Using Deep Features To Predict Where People Look, Matthias Kümmerer, Matthias Bethge

MODVIS Workshop