Extracting parallel fragments from comparable corpora for data-to-text generation

BELZ, ANJA and Kow, Eric (2010) Extracting parallel fragments from comparable corpora for data-to-text generation In: Proceedings of 6th International Natural Language Generation Conference (INLG'10), July 7 - 9, 2010, Dublin, Ireland.

Full text not available from this repository.

Abstract

Building NLG systems, in particular statistical ones, requires parallel data (paired inputs and outputs) which do not generally occur naturally. In this paper, we investigate the idea of automatically extracting parallel resources for data-to-text generation from comparable corpora obtained from the Web. We describe our comparable corpus of data and texts relating to British hills and the techniques for extracting paired input/output fragments we have developed so far.

Item Type: Contribution to conference proceedings in the public domain ( Full Paper)
Subjects: Q000 Languages and Literature - Linguistics and related subjects > Q100 Linguistics
DOI (a stable link to the resource): 10.1.1.180.3640
Faculties: Faculty of Science and Engineering > School of Computing, Engineering and Mathematics > Natural Language Technology
Depositing User: Converis
Date Deposited: 07 Jan 2011 12:21
Last Modified: 07 Feb 2013 03:05
URI: http://eprints.brighton.ac.uk/id/eprint/8058

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year