Comparing automatic and human evaluation of NLG systems


As of July 2018 University of Brighton Repository is no longer updated. Please see our new repository at

Belz, Anja and Reiter, Ehud (2006) Comparing automatic and human evaluation of NLG systems In: 11th Conference of the European Chapter of the Association for Computational Linguistics, April 3 -7, 2006, Trento, Italy.

[img] Text
nlgeval-final-2.pdf - Published Version
Restricted to Registered users only

Download (63kB)


We consider the evaluation problem in Natural Language Generation (NLG) and present results for evaluating several NLG systems with similar functionality, including a knowledge-based generator and several statistical systems. We compare evaluation results for these systems by human domain experts, human non-experts, and several automatic evaluation metrics, including NI ST, B LEU, and ROUGE. We find that NI ST scores correlate best (>0.8) with human judgments, but that all automatic metrics we examined are biased in favour of generators that select on the basis of frequency alone. We conclude that automatic evaluation of NLG systems has considerable potential, in particular where high-quality reference texts and only a small number of human evaluators are available. However, in general it is probably best for automatic evaluations to be supported by human based evaluations, or at least by studies that demonstrate that a particular metric correlates well with human judgments in a given domain.

Item Type: Contribution to conference proceedings in the public domain ( Full Paper)
Uncontrolled Keywords: Natural language generation systems
Subjects: G000 Computing and Mathematical Sciences > G400 Computing
Q000 Languages and Literature - Linguistics and related subjects > Q100 Linguistics
DOI (a stable link to the resource):
Faculties: Faculty of Science and Engineering > School of Computing, Engineering and Mathematics > Natural Language Technology
Depositing User: Converis
Date Deposited: 14 Nov 2007
Last Modified: 25 Feb 2015 14:48

Actions (login required)

View Item View Item


Downloads per month over past year