From machine readable dictionaries to lexical databases: the CONCEDE experience

Erjavec, T., Evans, R.P., Ide, N. and Kilgarriff, A. (2003) From machine readable dictionaries to lexical databases: the CONCEDE experience In: COMPLEX 2003, 7th Conference on Computational Lexicography and Text Research, Budapest, Hungary.

Full text not available from this repository.

Abstract

It is commonly held that machine-readable dictionaries play a key role in bootstrapping effective wide-coverage language-technology, especially in less well-resourced languages. However, while the linguistic knowledge they contain is clearly necessary for this goal, it is far from clear that the format it is presented in is sufficient to reach it. A crucial step in the deployment of such resources is to map them into lexical databases with standardised and well-understood structure and semantics. Furthermore, considerable additional benefits are obtained if such structure and semantics are shared with other linguistic resources. Achieving such a goal, however, is often not an easy task. This paper describes how such a mapping was carried out in the CONCEDE project, for six Central and Eastern European Languages (Bulgarian, Czech, Estonian, Hungarian, Romanian, and Slovene) for which few wide-coverage lexical resources had previously been available. In a two-stage process, the machine-readable data for each language was first mapped into broadly compatible, TEI-compliant SGML representations, and then these representations were harmonised into a single XML scheme. The resulting framework offers a concise, flexible lexical database specification, with a demonstrable ability to cope with a diverse range of dictionary and language requirements, and lexical resources suitable for monolingual and multilingual application.

Item Type: Contribution to conference proceedings in the public domain ( Full Paper)
Subjects: Q000 Languages and Literature - Linguistics and related subjects > Q100 Linguistics
Faculties: Faculty of Science and Engineering > School of Computing, Engineering and Mathematics > Natural Language Technology
Depositing User: Helen Webb
Date Deposited: 17 Nov 2007
Last Modified: 15 Jun 2012 12:59
URI: http://eprints.brighton.ac.uk/id/eprint/3193

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year