Datasets for Generic Relation Extraction
Datasets for Generic Relation Extraction consists of publicly available corpora (ACE 2004, ACE 2005 and BioInfer) that have been standardised for evaluation of multi-type relation extraction across domains. Annotation includes: 1) refactoring of the original data to a common XML document type, 2) linguistic information from LT-TTT and Minipar, and 3) normalised IE markup that complies with a simiple, intuitive notion of what constitutes a relation across domains.
Further details and download can be found here.
HOLJ Argumentative Zoning and Summarisation Corpus
The HOLJ Corpus consists of 188 House of Lords judgments from the years 2001-2003. Annotation includes: 1) shallow linguistic information from the LT-TTT tools, 2) manual annotation of sentence-level rhetorical roles, and 3) manual annotation of extract-worthiness.
Further details and download can be found here.