The Charniak and Charniak-Johnson Reranking parsers are syntactic parsers of natural languages (e.g. English), both developed at the Brown Laboratory for Linguistic Information Processing (BLLIP); see their resource page for more information (the parsers can be downloaded directly from ftp://ftp.cs.brown.edu/pub/nlparser).
Both parsers, implemented in C++, are distributed as source code and, in order to use them, one obviously needs to compile them first. On modern machines this may now raise some problems and this note is supposed to help to address the issues.
On Linux platforms (I tested on Ubuntu 11.04) a possible solution, reported on the corpora mailing list, is to use the g++ 3.3 compiler instead of the now commonly used 4.x version. An alternative is to use the patch prepared by Nitin Madnani; this changes the source code of the reranking parser so that it can compile on modern 64-bit Linux distributions.
On Mac OS X we can compile both parsers by making small changes to the source code
(thanks to Brett Powley for investigating this with me).
In the case of the Charniak parser (parser05Aug16) you need to add
To compile the content of the
you need to change a part of the line 310 in
This is necessary because we now (by default) compile a 64-bit version and a (void*) doesn't cast nicely into an (int) in 64 bits, but conversion to (long) and then to (int) compiles well.
In the case of the reranking parser (reranking-parserAug06)
change the line 128 of
int id = (int)arg;
int id = (long)arg;
However, before compiling the parser, we need to clean a bit first, otherwise we are likely to get the following error when trying to run the parser:
./parse.sh: line 6: second-stage/programs/features/best-parses: cannot execute binary file
second-stage/programs/features/best-parses is not recompiled;
make clean before executing
Then, when trying to run the reranking parser I got two error messages from
zcat: second-stage/models/ec50spfinal/features.gz.Z: No such file or directory
zcat: second-stage/models/ec50spfinal/cvlm-l1c10P1-weights.gz.Z: No such file or directory
although the file
and these files existed in the relevant directory.
This did not seem to influence the parsing results, nevertheless renaming the files (adding the
and making the corresponding changes to
reranking-parser/parse.sh stopped zcat to complain.
Alternatively, you can use
gzcat instead of
zcat in Makefiles.
If you want to run the parser from a directory other than
parse.sh script and add
`dirname $0`/ before
each call and directory, so that the complete command is:
`dirname $0`/first-stage/PARSE/parseIt -l399 -N50 `dirname $0`/first-stage/DATA/EN/ $* | `dirname $0`/second-stage/programs/features/best-parses -l `dirname $0`/$MODELDIR/features.gz.Z `dirname $0`/$MODELDIR/$ESTIMATORNICKNAME-weights.gz.Z