Learning to Generate Descriptions of Visual Data Anchored in Spatial Relations

Muscat, Adrian and Belz, Anja (2017) Learning to Generate Descriptions of Visual Data Anchored in Spatial Relations IEEE Computational Intelligence Magazine, 12 (3). pp. 29-42. ISSN 1556-603X

[img] Text
muscat-belz-ieee-cim-FINAL.pdf - Accepted Version

Download (1MB)

Abstract

The explosive growth of visual data both online and offline in private and public repositories has led to urgent requirements for better ways to index, search, retrieve, process and manage visual content. Automatic methods for generating image descriptions can help with all these tasks as well as playing an important role in assistive technology for the visually impaired. The task we address in this paper is the automatic generation of image descriptions that are anchored in spatial relations. We construe this as a three-step task where the first step is to identify objects in an image, the second stepdetects spatial relations between object pairs on the basis of language and visual features; and in the third step, the spatial relations are mapped to natural language (NL) descriptions. We describe the data we have created, and compare a range of machine learning methodsin terms of the success with which they learn the mapping from features to spatial relations, using automatic and human-assessed evaluations. We find that a random forest model performs best by a substantial margin. We examine aspects of our approach in more detail, including data annotationand choice of features. For Step 3, we describe six alternative natural language generation (NLG) strategies, evaluate the resulting NL strings using measures of correctness, naturalness and completeness. Finally we discuss evaluation issues, including the importance of extrinsic context in data creation and evaluation design.

Item Type: Journal article
Additional Information: © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Subjects: G000 Computing and Mathematical Sciences > G700 Artificial Intelligence > G760 Machine Learning
G000 Computing and Mathematical Sciences > G700 Artificial Intelligence > G710 Speech & natural language processing
?? G740 ??
DOI (a stable link to the resource): 10.1109/MCI.2017.2708559
Depositing User: Converis
Date Deposited: 11 May 2017 03:02
Last Modified: 03 Aug 2017 12:20
URI: http://eprints.brighton.ac.uk/id/eprint/16878

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year