Computable News
The Computable News project is a collaboration with Fairfax Digital, Australia's leading provider of online news and classifieds. The project is leveraging cutting-edge approaches to entity and event tracking to assist journalists and news comsumers to find relevant stories and research news archives. We are currently exploring named entity linking and user-driven event extraction.
Publications
Ben Hachey, Will Radford, Joel Nothman, Matthew Honnibal and James R. Curran (to appear). Evaluating Entity Linking with Wikipedia. Artificial Intelligence.
Ben Hachey, Will Radford and James
R. Curran (2011). Graph-based Named Entity Linking with
Wikipedia. In: the 12th International Conference on Web
Information System Engineering, Sydney, NSW, Australia.
[Official: springer, Preprint: pdf]
Will Radford, Ben Hachey, Joel Northman, Matthew Honnibal and
James Curran (2010). Document-level entity linking: CMCRC at TAC
2010. In: Proceedings of the Text Analysis Conference,
Gaithersburg, MD, USA.
[pdf]
Financial Text Analytics
The Capital Markets Cooperative Research Centre develops and commercialises technological solutions that enhance the efficiency and integrity of capital markets. The focus of the Financial Text Analytics project is on improving financial surveillance technology. Existing systems alert analysts to suspicious trading events based on trading data. Our aim is to identify explainable false positives (e.g., caused by price-sensitive information in company news) and explainable true positives (e.g., caused by ramping in forums) by aligning these alerts with publicly available information. We are also exploring the problem of tracking information flow from official sources (e.g., company announcements) to news.
Publications
Will Radford, Ben Hachey, James R. Curran and Maria
Milosavljevic (2010). Tracking Information Flow between Primary
and Secondary News Sources In: NAACL Workshop on
Computational Linguistics in a World of Social Media, Los
Angeles, CA, USA.
[pdf]
Maria Milosavljevic, Jean-Yves Delort, Ben Hachey, Bavani
Arunasalam, Will Radford and James Curran (2009). Automating
Financial Surveillance. In: Proceedings of the Workshop on
Mining User-Generated Content for Security, Venice,
Italy.
[pdf]
Will Radford, Ben Hachey, James R. Curran and Maria
Milosavljevic (2009). Tracking Information Flow in Financial
Text. In: Proceedings of the Australasian Language
Technology Workshop, Sydney, NSW, Australia.
[pdf]
Edinburgh and Stanford Information Extraction (EASIE)
My PhD work on generic relation extraction was funded by the Edinburgh-Stanford Link and carried out at the University of Edinburgh as part of the EASIE project. This work explored minimally supervised methods for relation extraction and demonstrated their utility for extractive summarisation. I also worked on summarisation using SVD-reduced distributional representations of word semantics and on committee-based active learning for information extraction from conference/workshop announcements.
Selected Publications
Ben Hachey, Claire Grover and Richard Tobin (to
appear). Datasets for Generic Relation Extraction.
Journal of Natural Language Engineering
(2011).
[Official: doi,
Preprint: pdf]
Ben Hachey (2009). Generic Relation Identification: Models
and evaluation. In: Australasian Language Technology
Workshop, Sydney, NSW, Australia.
[pdf]
Ben Hachey (2009). Multi-Document Summarisation Using Generic
Relation Extraction. In: Proceedings of the Conference on
Empirical Methods in Natural Language Processing,
Singapore.
[pdf]
Ben Hachey (2009). Towards Generic Relation Extraction.
PhD Thesis, University of Edinburgh.
[pdf]
Ben Hachey, Gabriel Murray and David Reitter (2006).
Dimensionality Reduction Aids Term Co-Occurrence based
Multi-Document Summarisation. In: Proceedings of the ACL
2006 Task-Focused Summarization and Question Answering
Workshop, Sydney, NSW, Australia.
[pdf]
Ben Hachey, Markus Becker, Claire Grover, and Ewan Klein
(2005). Selective Sampling for Information Extraction with a
Committee of Classifiers. In: PASCAL 2005 Challenge Workshop
on Evaluating Machine Learning for Information Extraction,
Southampton, UK
[Slides: pdf; System
Description: pdf]
Flexible Summaries (SUM)
The SUM project at the University of Edinburgh examined the use of rhetorical/discourse structure information and sentence extraction for flexible automatic text summarisation in the legal domain. The HOLJ corpus contains annotation of rhetorical roles and summary sentences and can be downloaded for free.
Selected Publications
Ben Hachey and Claire Grover (2006). Extractive Summarisation
of Legal Texts. Artificial Intelligence and Law,
14(4):305-345.
[Official: doi;
Preprint: pdf]
Ben Hachey and Claire Grover (2005). Sequence Modelling for
Sentence Classification in a Legal Summarisation System. In:
Proceedings of the 2005 ACM Symposium on Applied Computing (SAC
2005), Santa Fe, New Mexico USA.
[Official: acm;
Preprint: pdf]
B. Hachey and C. Grover (2005). Sentence Extraction for Legal
Text Summarisation. In: 19th International Joint Conference
on Artificial Intelligence (IJCAI-05), Edinburgh,
Scotland.
[pdf]
Stanford-Edinburgh Entity Recognition (SEER)
The SEER project at the University of Edinburgh explored minimally supervised machine learning of entity recognisers. I worked on bootstrapping NER in astronomy abstracts using active learning and on grounding gene mentions from biomedical text to gene database identifiers.
Selected Publications
Ben Hachey, Beatrice Alex and Markus Becker
(2005). Investigating the Effects of Selective Sampling on the
Annotation Task. In: Proceedings of the 9th Conference on
Computational Natural Language Learning, Ann Arbor, Michigan,
USA.
[pdf]
Markus Becker, Ben Hachey, Beatrice Alex and Claire Grover
(2005). Optimising Selective Sampling for Bootstrapping Named
Entity Recognition. In: Proceedings of the ICML-2005
Workshop on Learning with Multiple Views, Bonn, Germany
[pdf]
Ben Hachey, Huy Nguyen, Malvina Nissim, Beatrice Alex, and
Claire Grover (2004). Grounding Gene Mentions with Respect to
Gene Database Identifiers. In: BioCreAtIvE Workshop
(Critical Assessment of Information Extraction Systems in
Biology), Granada, Spain.
[pdf]
CROSS-lingual Multi-Agent Retail Comparison (CROSSMARC)
The CROSSMARC project at the University of Edinburgh developed techniques for modular cross-lingual e-retail comparison for rapid adaptation to new product domains. I developed techniques for extracting product details from web pages for cross-lingual e-retail comparison.
Selected Publications
G. Petasis, V. Karkaletsis, C. Grover, B. Hachey, M-T. Pazienza,
M. Vindigni, J. Coch (2004). Adaptive, Multilingual Named Entity
Recognition in Web Pages. In: Proceedings of the 16th
European Conference on Artificial Intelligence (ECAI 2004),
Valencia, Spain.
[pdf]
V. Karkaletsis, C.D. Spyropoulos, D. Souflis, C. Grover,
B. Hachey, M-T. Pazienza, M. Vindigni, E. Cartier, J. Coch
(2003). Demonstration of the CROSSMARC System. In: Human
Language Technology Conference (HLT/NAACL-2003). Edmonton, AB,
Canada.
[pdf]
Spoken Language User Interface Toolkit (SLUITK)
The Spoken Language User Interface Toolkit (SLUITK) project at BCL Technologies developed software to assist programmers in developing programs that can understand, process, and act upon spoken natural language input. During a summer internship in 2002, I designed and implemented C++ text normalisation code for the SLUITK.