Perceptual narratives of space and motion for semantic interpretation of visual data

Jakob Suchan, Mehul Bhatt, Paulo E. Santos

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Citations (Scopus)


We propose a commonsense theory of space and motion for the high-level semantic interpretation of dynamic scenes. The theory provides primitives for commonsense representation and reasoning with qualitative spatial relations, depth profiles, and spatio-temporal change; these may be combined with probabilistic methods for modelling and hypothesising event and object relations. The proposed framework has been implemented as a general activity abstraction and reasoning engine, which we demonstrate by generating declaratively grounded visuo-spatial narratives of perceptual input from vision and depth sensors for a benchmark scenario. Our long-term goal is to provide general tools (integrating different aspects of space, action, and change) necessary for tasks such as realtime human activity interpretation and dynamic sensor control within the purview of cognitive vision, interaction, and control.

Original languageEnglish
Title of host publicationComputer Vision - ECCV 2014 Workshops, Proceedings
EditorsCarsten Rother, Michael M. Bronstein, Lourdes Agapito
Place of PublicationCham
Number of pages16
ISBN (Electronic)9783319161815
ISBN (Print)9783319161808
Publication statusPublished - 20 Mar 2015
Externally publishedYes
Event13th European Conference on Computer Vision, ECCV 2014 - Zurich, Switzerland
Duration: 6 Sept 201412 Sept 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference13th European Conference on Computer Vision, ECCV 2014


  • Logic Programming
  • Spatial Change
  • Semantic Interpretation
  • Dynamic Scene
  • Twilight Zone


Dive into the research topics of 'Perceptual narratives of space and motion for semantic interpretation of visual data'. Together they form a unique fingerprint.

Cite this