News
[2017/09/11] We have released a docker image of the code of the system that participated at BioASQ.
[2016/12/15] We have made live EBMSummariser, a demo of the functionality that we aim to achieve.
[2011/10/27] The code of the clinical evidence detector is now available under an Apache License V2.0 as a SourceForge project.
[2011/06/31] The complete corpus based on the clinical inquiries of the Journal of Family Practice is now available. Contact Diego Molla for further details. Include a short introduction of yourself and what you plan to do with the corpus.
Diego Mollá Aliod
Natural Language Processing of Medical Texts
This project focuses on the application of natural language processing techniques on medical texts. We pay special emphasis on finding methods to help the practice of evidence based medicine.
Live Demos
- WebEBMSummariser - This is a demo of the sort of functionality that we wish to achieve in this project. Many of the backend components of this demo are simple baselines, so don't expect to get accurate results! But feel free to send us suggestions about the functionality, and definitely contact Diego if you wish to join this project and improve the system!
Resources
- Runs at BioASQ 11b [github]
- Runs at BioASQ 10 - Synergy [github]
- Runs at BioASQ 10b [github]
- Runs at BioASQ 9b [github]
- Runs at BioASQ 8 - Synergy [github]
- Runs at BioASQ 8b [docker] [github]
If you use this code, please cite:
D. Mollá, C. Jones, V. Nguyen Query Focused Multi-document Summarisation of Biomedical Texts: Macquarie Universiy and the Australian National University at BioASQ8b (2020). CLEF 2020 Working Notes. [arxiv] [docker image] [github]
- Runs at BioASQ 7b [docker]
If you use this code, please cite:
D. Mollá, C. Jones. Classification Betters Regression in Query-based Multi-document Summarisation Techniques for Question Answering: Macquarie University at BioASQ7b (2019). Proc. BioASQ Workshop at ECML-PKDD 2019. [arxiv] [docker image]
- Runs at BioASQ 6b [github]
If you use this code, please cite:
D. Mollá. Macquarie University at BioASQ 6b: Deep learning and deep reinforcement learning for query-based multi-document summarisation (2018). Proceedings of the 6th BioASQ Workshop A challenge on large-scale biomedical semantic indexing and question answering. [arxiv] [github] - Reinforcement learning [github]
If you use this code, please cite:
D. Mollá. Towards the Use of Deep Reinforcement Learning with Global Policy For Query-based Extractive Summarisation (2017). Proc. ALTA 2017 [arxiv] [code]. - Runs at BioASQ 5b [docker]
If you use this code, please cite:
D. Mollá. Macquarie University at BioASQ 5b -- Query-based Summarisation Techniques for Selecting the Ideal Answers (2017). Proc. BioNLP 2017. [arxiv] [docker image]
- Corpus for EBM Query-based Summarisation [sourceforge]
If you use this corpus, please cite:
D. Mollá, M.E. Santiago-Martínez A. Sarker,
C. Paris. A Corpus for Research in Text Processing for
Evidence Based Medicine (2016). Language Resources
and Evaluation, 50(4):705-727. DOI 10.1007/s10579-015-9327-2
- EBMSummariser [bitbucket]
If you use this code, please cite:
A. Sarker, D. Mollá, C. Paris. Query-Oriented
Evidence Extraction to Support
Evidence-based Medicine Practice (2016). Journal of
Biomedical Informatics, 59:169-184. DOI 10.1016/j.jbi.2015.11.010
- Clinical
Evidence Detector [sourceforge]
If you use this code, please cite:
P. Davis-Desmond and Diego Mollá. Detection of
Evidence in Clinical Research Papers
(2012). Australasian Workshop On Health Informatics
and Knowledge Management
(HIKM
2012), Melbourne, Australia. [slides]
Members
- Diego Mollá (project director and chief investigator)
- Cécile Paris (partner investigator, CSIRO ICT Centre)
Past Members
The following people have done significant work for the project, leading to publication.
- Abeed Sarker (PhD student)
- Patrick Davis-Desmond (Masters student)
- Sara Faisal Shash (Masters student)
- Andreea Tutos (Masters student)
- Christopher Jones (research programmer, Master of Research student)
- Maria Elena Santiago-Martinez (research programmer)
- Hamed Hassanzadeh (research visitor)
- Urvashi Khanna (Master of Research student)
- Dima Galat (Master of Research student)
Join this Project
There are several ways you can join this project:
- As a PhD student: If you have an interesting topic of research that has to do with question answering and/or summarisation, send us an email and we will get in touch with you.
- As a Masters/Honours student: If you are a Masters or an Honours student in Macquarie University and you are searching for a project, consult the list of Honours projects. Many of these projects can be adapted to Masters projects.
- As an undergraduate student: If you are a student enrolled in Macquarie University you can also work for AnswerFinder and get paid for it. Consult the list of summer projects.
Related Links
- Related Honours Projects
- Related PhD Scholarships
- Other projects at the Centre for Language Technology
Publications up to 2020
For any publications from 2021, please visit my Macquarie University research page, or my Google Scholar page.
D. Mollá, C. Jones, V. Nguyen Query Focused Multi-document Summarisation of Biomedical Texts: Macquarie Universiy and the Australian National University at BioASQ8b (2020). CLEF 2020 Working Notes. [arxiv] [github] [docker image]
D. Mollá, C. Jones. Classification Betters Regression in Query-based Multi-document Summarisation Techniques for Question Answering: Macquarie University at BioASQ7b (2019). Proc. BioASQ Workshop at ECML-PKDD 2019. [arxiv] [docker image]
M. Kaur and D. Mollá. Supervised Machine Learning for Extractive Query Based Summarisation of Biomedical Data (2018). Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis (Louhi 2018). [arxiv]
D. Mollá. Macquarie University at BioASQ 6b: Deep learning and deep reinforcement learning for query-based multi-document summarisation (2018). Proceedings of the 6th BioASQ Workshop A challenge on large-scale biomedical semantic indexing and question answering. [arxiv] [github]
V. Nguyen, S. Karimi, S. Falamaki, D. Mollá, C. Paris, S. Wan. CSIRO at 2017 TREC Precision Medicine Track (2017). The Twenty-Sixth Text REtrieval Conference.
D. Mollá. Towards the Use of Deep Reinforcement Learning with Global Policy For Query-based Extractive Summarisation (2017). Proc. ALTA 2017 [arxiv] [code].
D. Mollá. Macquarie University at BioASQ 5b -- Query-based Summarisation Techniques for Selecting the Ideal Answers (2017). Proc. BioNLP 2017. [arxiv] [docker image]
A. Sarker, D. Mollá, C. Paris Automated Text Summarisation and Evidence-Based Medicine: A Survey of Two Domains (2017). arXiv:1706.08162, 35 pages.
P. Sahoo, A. Ekbal, S. Saha, D. Mollá,
K. Nandan. Semi-supervised Clustering of Medical
Text (2016). Proceedings of the COLING 2016 Workshop on
Clinical Natural Language Processing
(ClinicalNLP). Best
paper award in "long paper" category.
D. Mollá, M.E. Santiago-Martínez A. Sarker,
C. Paris. A Corpus for Research in Text Processing for
Evidence Based Medicine (2016). Language Resources
and Evaluation, 50(4):705-727. DOI 10.1007/s10579-015-9327-2
A. Sarker, D. Mollá, C. Paris. Query-Oriented
Evidence Extraction to Support
Evidence-based Medicine Practice (2016). Journal of
Biomedical Informatics, 59:169-184. DOI 10.1016/j.jbi.2015.11.010
M. Yousefi Azar, K. Sirts, L. Hamey, D. Mollá.
Query-Based Single Document Summarization Using an Ensemble Noisy
Auto-Encoder (2015). Proceedings of the Australasian
Language Technology Association Workshop 2015, pp2-10
H. Hassanzadeh, D. Mollá, T. Groza, A. Nguyen, J. Hunter.
Similarity Metrics for Clustering PubMed Abstracts for Evidence Based
Medicine (2015). Proceedings of the Australasian Language
Technology Association Workshop 2015, pp48-56
A. Sarker, D. Mollá, C. Paris. Automatic Evidence
Quality Prediction to Support Evidence-based Decision Making
(2015). Artificial Intelligence in Medicine,
64(2)89-103. PubMed
ID 25983133. DOI 10.1016/j.artmed.2015.04.001
D. Mollá, C. Jones,
Abeed Sarker.
Impact of Citing Papers for Summarisation of Clinical Documents (2014). Proceedings of the Australasian
Language Technology Association Workshop 2014
(ALTA 2014),
pp53-61, Brisbane,
Australia. DOI 10.13140/2.1.2366.1126
[slideshare]
[researchgate]
D. Mollá, I. Amini, D. Martinez. Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation (2014) Proceedings ACM SIGIR Workshop on Gathering Efficient Assessments of Relevance (GEAR), Gold Coast, Australia. [slideshare] [researchgate]
D. Mollá, D. Martinez, and
I. Amini. Towards Information Retrieval Evaluation with
Reduced and Only Positive Judgements (2013). Proceedings
of the 18th Australasian Document Computing Symposium
(ADCS 2013),
Brisbane, Australia. [poster]
A. Ekbal, S. Saha, D. Mollá, and K. Ravikumar.
Multi-Objective Optimization for Clustering of Medical
Publications (2013). Proceedings of the Australasian
Language Technology Association Workshop 2013
(ALTA 2013),
pp53-61, Brisbane,
Australia. [slideshare]
A. Sarker,
D. Mollá, C. Paris. Automatic Prediction of Evidence-based Recommendations via Sentence-level Polarity Classification (2013). Proceedings 6th International Joint Conference on Natural Language Processing (IJCNLP2013), pp712-718, Nagoya, Japan. [slideshare]
A. Sarker,
D. Mollá, C. Paris. An Approach for Query-Focused Text Summarisation for Evidence Based Medicine (2013). Proceedings
of the 14th Conference on Artificial Intelligence in Medicine
(AIME 2013),
Murcia, Spain.
Lecture Notes in Computer Science, Vol. 7885, pp 295-304.
[manuscript]
S.F. Shash,
D. Mollá. Clustering of Medical Publications for
Evidence Based Medicine Summarisation (2013). Proceedings
of the 14th Conference on Artificial Intelligence in Medicine
(AIME 2013),
Murcia, Spain.
Lecture Notes in Computer Science, Vol. 7885, pp 305-309.
[manuscript]
[poster]
A. Sarker, D. Mollá and C. Paris. An Approach for Automatic Multi-label Classification of
Medical Sentences (2013). Proceedings of the
Fourth International Workshop on Health Document Text Mining and
Information Analysis (LOUHI 2013), Sydney, Australia.
D. Mollá. Experiments with Clustering-based
Features for Sentence Classification in Medical Publications:
Macquarie Test's participation in the ALTA 2012 shared task.
. Proceedings of the 2012 Australasian Language
Technology Workshop
(ALTA
2012), Dunedin, New Zealand.
I. Amini, D. Martinez and D. Mollá. Overview of the
ALTA 2012 Shared Task (2012). Proceedings of the 2012
Australasian Language Technology Workshop
(ALTA
2012), Dunedin, New Zealand.
A. Sarker, D. Mollá and C. Paris. Towards Two-step Multi-document Summarisation for Evidence Based
Medicine: A Quantitative Analysis (2012). Proceedings of
the 2012 Australasian Language Technology Workshop (ALTA 2012),
Dunedin, New Zealand.
D. Martinez, A. MacKinlay, D. Mollá, L. Cavedon and
K. Verspoor. Simple similarity-based question answering
strategies for biomedical text (2012). QA4MRE Workshop
at Conference and Labs of the Evaluation Forum, CLEF
2012. September 17-20, 2012.
D. Mollá
and M.E. Santiago-Martínez. Creation of a Corpus for
Evidence Based Medicine Summarisation
(2012). Australasian Medical
Journal, 5(9).
A. Sarker,
D. Mollá and C. Paris. Extractive Summarisation of
Medical Documents using Domain Knowledge and Corpus Statistics
(2012). Australasian Medical Journal.
A. Sarker,
D. Mollá and C. Paris. Extractive Evidence Based
Medicine Summarisation Based on Sentence-Specific Statistics
(2012). Proceedings of the 25th IEEE International
Symposium on Computer-based Medical Systems (CBMS2012),
Rome, Italy. [slides]
P. Davis-Desmond and Diego Mollá. Detection of
Evidence in Clinical Research Papers
(2012). Australasian Workshop On Health Informatics
and Knowledge Management
(HIKM
2012), Melbourne, Australia. [slides]
- D. Mollá and María Elena Santiago-Martínez. Creation of a Corpus for Evidence Medicine Summarisation (2011). Proceedings of the First Australian Workshop on Artificial Intelligence in Health (AIH 2011), Perth, Australia. [poster]
- A. Sarker, D. Mollá and Cécile Paris. Extractive Summarisation of Medical Documents using Domain Knowledge and Corpus Statistics (2011). Proceedings of the First Australian Workshop on Artificial Intelligence in Health (AIH 2011), Perth, Australia.
D. Mollá and A. Sarker. Automatic Grading of
Evidence: The 2011 ALTA Shared Task (2011). Proceedings of
the 2011 Australasian Language Technology Workshop
(ALTA 2011),
Canberra, Australia.
A. Sarker, D. Mollá and Cécile Paris. Outcome Polarity Identification of Medical Papers (2011). Proceedings of
the 2011 Australasian Language Technology Workshop
(ALTA 2011),
Canberra, Australia.
D. Mollá and María Elena
Santiago-Martínez. Development of a Corpus for
Evidence Medicine Summarisation (2011). Proceedings of
the 2011 Australasian Language Technology Workshop
(ALTA 2011),
Canberra, Australia. [slideshare]
A. Sarker, D. Mollá and Cécile
Paris. Towards Automatic Grading of Evidence
(2011). Proceedings of the Third International Workshop on
Health Document Text Mining and Information Analysis (LOUHI 2011),
pp51-58. Bled, Slovenia.
A. Sarker and D. Mollá. A Rule-based Approach for
Automatic Identification of Publication Types of Medical Papers (2010).
Proceedings ADCS
2010, 5 pages. Melbourne.
D. Mollá. A Corpus for Evidence Based Medicine Summarisation (2010).
Proceedings ALTA
2010, pp.76-80. Melbourne. [slides>]
A. Tutos and D. Mollá. A Study on the Use of Search Engines for Question Answering in Biomedicine (2010).
Australasian Workshop On Health Informatics and Knowledge Management
(HIKM), 8
pages. Brisbane. [slides]
Other Presentations
Macquarie University Workshop on Text Mining and Health, opening presentation at
the Macquarie University Workshop on Text Mining and Health, Macquarie University, Sydney, September 2014.
Text Summarisation for Evidence Based Medicine, presentation at
the Indo-Australia Workshop on Optimization Techniques for Human
Language Technology, India Institute of Technology Patna, December
16 2012.
Automated Summarisation for Evidence Based Medicine, HAIL seminar
22 March 2012.
Funding
This research is partly funded by a Macquarie University Research Development Grant and by CSIRO.

Parts of this work are licensed under a
GNU General
Public License GPLv3.
Parts of this work are licensed under an
Apache License, Version 2.0.

