Optimisation of corpus-derived probabilistic grammars


As of July 2018 University of Brighton Repository is no longer updated. Please see our new repository at http://research.brighton.ac.uk.

Belz, Anja (2001) Optimisation of corpus-derived probabilistic grammars In: Corpus Linguistics 2001, 30 Mar - 2 Apr 2001, Lancaster University, UK.


Download (223kB) | Preview


This paper examines the usefulness of corpus-derived probabilistic grammars as a basis for the automatic construction of grammars optimised for a given parsing task. Initially, a probabilistic context-free grammar (PCFG) is derived by a straightforward derivation technique from the Wall Street Journal (WSJ) Corpus, and a baseline is established by testing the resulting grammar on four different parsing tasks. In the first optimisation step, different kinds of local structural context (LSC) are incorporated into the basic PCFG. Improved parsing results demonstrate the usefulness of the added structural context information. In the second optimisation step, LSC-PCFGs are optimised in terms of grammar size and performance for a given parsing task. Tests show that significant improvements can be achieved by the method proposed. The structure of this paper is as follows. Section 2 discusses the practica

Item Type: Contribution to conference proceedings in the public domain ( Full Paper)
Subjects: Q000 Languages and Literature - Linguistics and related subjects > Q100 Linguistics
DOI (a stable link to the resource):
Faculties: Faculty of Science and Engineering > School of Computing, Engineering and Mathematics > Natural Language Technology
Depositing User: Converis
Date Deposited: 15 Nov 2007
Last Modified: 25 Feb 2015 14:53
URI: http://eprints.brighton.ac.uk/id/eprint/3169

Actions (login required)

View Item View Item


Downloads per month over past year