Protocols from perceptual observations

Chris J. Needham, Paulo E. Santos, Derek R. Magee, Vincent Devin, David C. Hogg, Anthony G. Cohn

Research output: Contribution to journalArticlepeer-review

38 Citations (Scopus)


This paper presents a cognitive vision system capable of autonomously learning protocols from perceptual observations of dynamic scenes. The work is motivated by the aim of creating a synthetic agent that can observe a scene containing interactions between unknown objects and agents, and learn models of these sufficient to act in accordance with the implicit protocols present in the scene. Discrete concepts (utterances and object properties), and temporal protocols involving these concepts, are learned in an unsupervised manner from continuous sensor input alone. Crucial to this learning process are methods for spatio-temporal attention applied to the audio and visual sensor data. These identify subsets of the sensor data relating to discrete concepts. Clustering within continuous feature spaces is used to learn object property and utterance models from processed sensor data, forming a symbolic description. The progol Inductive Logic Programming system is subsequently used to learn symbolic models of the temporal protocols presented in the presence of noise and over-representation in the symbolic data input to it. The models learned are used to drive a synthetic agent that can interact with the world in a semi-natural way. The system has been evaluated in the domain of table-top game playing and has been shown to be successful at learning protocol behaviours in such real-world audio-visual environments.

Original languageEnglish
Pages (from-to)103-136
Number of pages34
JournalArtificial Intelligence
Issue number1-2
Publication statusPublished - Sept 2005
Externally publishedYes


  • Autonomous learning
  • Cognitive vision
  • Inductive logic programming
  • Spatio-temporal reasoning
  • Symbol grounding
  • Unsupervised clustering


Dive into the research topics of 'Protocols from perceptual observations'. Together they form a unique fingerprint.

Cite this