Feature selection using averaged perceptron and greedy feature selection
========================================================================

Mark Johnson, 19th November 2009

make eval-reranker -f Makefile.Berkeley VERSION=nonfinal NBESTPARSERBASEDIR=BerkeleyParser NBESTPARSERNICKNAME=berkeley FEATUREEXTRACTOR=second-stage/programs/features/extract-spfeatures FEATURESNICKNAME=sp ESTIMATOR=second-stage/programs/wlle/gavper ESTIMATORNICKNAME=gavper FEATUREEXTRACTORFLAGS=-l -c -i -s 5 ESTIMATORFLAGS=-n 10 -d 10 -F 1 -m 0

EC first-stage parser
=====================

Dev scores, best score = 0.909508
features = NLogP Heavy CoLenPar Word WProj NGram NGramTree HeadTree Edges WordEdges
excluded features = RightBranch CoPar Rule Heads
nfeatures = 10

Test1 scores, best score = 0.917422
features = NLogP RightBranch Heavy CoPar Word NGram NGramTree HeadTree Heads Edges WordEdges 
excluded features = CoLenPar WProj Rule
nfeatures = 11

Average Dev and Test1 scores, best score = 0.9131965
features = NLogP RightBranch Heavy CoPar CoLenPar Word WProj Rule NGramTree HeadTree Heads Edges WordEdges
excluded features = NGram
nfeatures = 13


Berkeley first-stage parser
===========================

Dev scores, best score = 0.91006
features = NLogP RightBranch CoPar CoLenPar Word WProj Rule NGram HeadTree Heads WordEdges
excluded features = Heavy NGramTree Edges
nfeatures = 11

Test1 scores, best score = 0.915704
features = NLogP RightBranch Heavy CoPar CoLenPar Word Rule NGram NGramTree Heads Edges WordEdges
excluded features = WProj HeadTree
nfeatures = 12

Average Dev and Test1 scores, best score = 0.912562
features = NLogP RightBranch Heavy CoLenPar Word WProj Rule NGram NGramTree Heads WordEdges
excluded features = CoPar HeadTree Edges
nfeatures = 11

Comparing EC and Berkeley on different feature sets and estimator settings
==========================================================================

Mark Johnson, 21st November 2009

Guess: Berkeley parser may benefit more from rule features than EC parser does

Results are f-scores on test1 of nonfinal feature split, 50-best

Features	       Estimator   	  	EC	Berkeley
		       
sp		       cvlm-l1c10P1		0.918	0.9157

sp		       owlqn-l1c10P1t1e-7	0.918

splh		       owlqn-l1c10t1e-7		0.918	0.9148

splh		       owlqn-l1c10P1p1t1e-7	0.9167	0.9148

spnnn		       owlqn-l1c10p1t1e-7	0.915	0.9151

wsall		       owlqn-l1c10t1e-7		0.914	

splhnn		       owlqn-l1c10P1p1t1e-7	0.9155

nfeatures	       owlqn-l1c10t1e-7			0.9146

rfeatures	       owlqn-l1c10t1e-7			0.9146

