Computable News

The Computable News project is a collaboration with Fairfax Digital, Australia's leading provider of online news and classifieds. The project is leveraging cutting-edge approaches to entity and event tracking to assist journalists and news comsumers to find relevant stories and research news archives. We are currently exploring named entity linking and user-driven event extraction.

Publications

Ben Hachey, Will Radford, Joel Nothman, Matthew Honnibal and James R. Curran (to appear). Evaluating Entity Linking with Wikipedia. Artificial Intelligence.

Ben Hachey, Will Radford and James R. Curran (2011). Graph-based Named Entity Linking with Wikipedia. In: the 12th International Conference on Web Information System Engineering, Sydney, NSW, Australia.
[Official: springer, Preprint: pdf]

Will Radford, Ben Hachey, Joel Northman, Matthew Honnibal and James Curran (2010). Document-level entity linking: CMCRC at TAC 2010. In: Proceedings of the Text Analysis Conference, Gaithersburg, MD, USA.
[pdf]

Financial Text Analytics

The Capital Markets Cooperative Research Centre develops and commercialises technological solutions that enhance the efficiency and integrity of capital markets. The focus of the Financial Text Analytics project is on improving financial surveillance technology. Existing systems alert analysts to suspicious trading events based on trading data. Our aim is to identify explainable false positives (e.g., caused by price-sensitive information in company news) and explainable true positives (e.g., caused by ramping in forums) by aligning these alerts with publicly available information. We are also exploring the problem of tracking information flow from official sources (e.g., company announcements) to news.

Publications

Will Radford, Ben Hachey, James R. Curran and Maria Milosavljevic (2010). Tracking Information Flow between Primary and Secondary News Sources In: NAACL Workshop on Computational Linguistics in a World of Social Media, Los Angeles, CA, USA.
[pdf]

Maria Milosavljevic, Jean-Yves Delort, Ben Hachey, Bavani Arunasalam, Will Radford and James Curran (2009). Automating Financial Surveillance. In: Proceedings of the Workshop on Mining User-Generated Content for Security, Venice, Italy.
[pdf]

Will Radford, Ben Hachey, James R. Curran and Maria Milosavljevic (2009). Tracking Information Flow in Financial Text. In: Proceedings of the Australasian Language Technology Workshop, Sydney, NSW, Australia.
[pdf]

Edinburgh and Stanford Information Extraction (EASIE)

My PhD work on generic relation extraction was funded by the Edinburgh-Stanford Link and carried out at the University of Edinburgh as part of the EASIE project. This work explored minimally supervised methods for relation extraction and demonstrated their utility for extractive summarisation. I also worked on summarisation using SVD-reduced distributional representations of word semantics and on committee-based active learning for information extraction from conference/workshop announcements.

Selected Publications

Ben Hachey, Claire Grover and Richard Tobin (to appear). Datasets for Generic Relation Extraction. Journal of Natural Language Engineering (2011).
[Official: doi, Preprint: pdf]

Ben Hachey (2009). Generic Relation Identification: Models and evaluation. In: Australasian Language Technology Workshop, Sydney, NSW, Australia.
[pdf]

Ben Hachey (2009). Multi-Document Summarisation Using Generic Relation Extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Singapore.
[pdf]

Ben Hachey (2009). Towards Generic Relation Extraction. PhD Thesis, University of Edinburgh.
[pdf]

Ben Hachey, Gabriel Murray and David Reitter (2006). Dimensionality Reduction Aids Term Co-Occurrence based Multi-Document Summarisation. In: Proceedings of the ACL 2006 Task-Focused Summarization and Question Answering Workshop, Sydney, NSW, Australia.
[pdf]

Ben Hachey, Markus Becker, Claire Grover, and Ewan Klein (2005). Selective Sampling for Information Extraction with a Committee of Classifiers. In: PASCAL 2005 Challenge Workshop on Evaluating Machine Learning for Information Extraction, Southampton, UK
[Slides: pdf; System Description: pdf]

Flexible Summaries (SUM)

The SUM project at the University of Edinburgh examined the use of rhetorical/discourse structure information and sentence extraction for flexible automatic text summarisation in the legal domain. The HOLJ corpus contains annotation of rhetorical roles and summary sentences and can be downloaded for free.

Selected Publications

Ben Hachey and Claire Grover (2006). Extractive Summarisation of Legal Texts. Artificial Intelligence and Law, 14(4):305-345.
[Official: doi;  Preprint: pdf]

Ben Hachey and Claire Grover (2005). Sequence Modelling for Sentence Classification in a Legal Summarisation System. In: Proceedings of the 2005 ACM Symposium on Applied Computing (SAC 2005), Santa Fe, New Mexico USA.
[Official: acm;  Preprint: pdf]

B. Hachey and C. Grover (2005). Sentence Extraction for Legal Text Summarisation. In: 19th International Joint Conference on Artificial Intelligence (IJCAI-05), Edinburgh, Scotland.
[pdf]

Stanford-Edinburgh Entity Recognition (SEER)

The SEER project at the University of Edinburgh explored minimally supervised machine learning of entity recognisers. I worked on bootstrapping NER in astronomy abstracts using active learning and on grounding gene mentions from biomedical text to gene database identifiers.

Selected Publications

Ben Hachey, Beatrice Alex and Markus Becker (2005). Investigating the Effects of Selective Sampling on the Annotation Task. In: Proceedings of the 9th Conference on Computational Natural Language Learning, Ann Arbor, Michigan, USA.
[pdf]

Markus Becker, Ben Hachey, Beatrice Alex and Claire Grover (2005). Optimising Selective Sampling for Bootstrapping Named Entity Recognition. In: Proceedings of the ICML-2005 Workshop on Learning with Multiple Views, Bonn, Germany
[pdf]

Ben Hachey, Huy Nguyen, Malvina Nissim, Beatrice Alex, and Claire Grover (2004). Grounding Gene Mentions with Respect to Gene Database Identifiers. In: BioCreAtIvE Workshop (Critical Assessment of Information Extraction Systems in Biology), Granada, Spain.
[pdf]

CROSS-lingual Multi-Agent Retail Comparison (CROSSMARC)

The CROSSMARC project at the University of Edinburgh developed techniques for modular cross-lingual e-retail comparison for rapid adaptation to new product domains. I developed techniques for extracting product details from web pages for cross-lingual e-retail comparison.

Selected Publications

G. Petasis, V. Karkaletsis, C. Grover, B. Hachey, M-T. Pazienza, M. Vindigni, J. Coch (2004). Adaptive, Multilingual Named Entity Recognition in Web Pages. In: Proceedings of the 16th European Conference on Artificial Intelligence (ECAI 2004), Valencia, Spain.
[pdf]

V. Karkaletsis, C.D. Spyropoulos, D. Souflis, C. Grover, B. Hachey, M-T. Pazienza, M. Vindigni, E. Cartier, J. Coch (2003). Demonstration of the CROSSMARC System. In: Human Language Technology Conference (HLT/NAACL-2003). Edmonton, AB, Canada.
[pdf]

Spoken Language User Interface Toolkit (SLUITK)

The Spoken Language User Interface Toolkit (SLUITK) project at BCL Technologies developed software to assist programmers in developing programs that can understand, process, and act upon spoken natural language input. During a summer internship in 2002, I designed and implemented C++ text normalisation code for the SLUITK.