Unsupervised alignment of comparable data and text resources


As of July 2018 University of Brighton Repository is no longer updated. Please see our new repository at http://research.brighton.ac.uk.

Belz, Anja and Kow, Eric (2011) Unsupervised alignment of comparable data and text resources In: Proceedings of the 4th Workshop on Building and Using Comparable Corpora, 49th Annual Meeting of the Association for Computational Linguistics, 19-24 June, 2011, Portland, Oregon, USA.

Full text not available from this repository.


In this paper we investigate automatic datatext alignment, i.e. the task of automatically aligning data records with textual descriptions, such that data tokens are aligned with the word strings that describe them. Our methods make use of log likelihood ratios to estimate the strength of association between data tokens and text tokens. We investigate datatext alignment at the document level and at the sentence level, reporting results for several methodological variants as well as baselines. We find that log likelihood ratios provide a strong basis for predicting data-text alignment.

Item Type: Contribution to conference proceedings in the public domain ( Full Paper)
Subjects: Q000 Languages and Literature - Linguistics and related subjects > Q100 Linguistics
Faculties: Faculty of Science and Engineering > School of Computing, Engineering and Mathematics > Natural Language Technology
Depositing User: Converis
Date Deposited: 21 Feb 2012 11:59
Last Modified: 25 Mar 2015 12:08
URI: http://eprints.brighton.ac.uk/id/eprint/9900

Actions (login required)

View Item View Item


Downloads per month over past year