News
[2017/09/11] We have released a docker image of the code of the system that participated at BioASQ.
[2016/12/15] We have made live EBMSummariser, a demo of the functionality that we aim to achieve.
[2011/10/27] The code of the clinical evidence detector is now available under an Apache License V2.0 as a SourceForge project.
[2011/06/31] The complete corpus based on the clinical inquiries of the Journal of Family Practice is now available. Contact Diego Molla for further details. Include a short introduction of yourself and what you plan to do with the corpus.
Diego Mollá Aliod
Natural Language Processing of Medical Texts
This project focuses on the application of natural language processing techniques on medical texts. We pay special emphasis on finding methods to help the practice of evidence based medicine.
Live Demos
- WebEBMSummariser - This is a demo of the sort of functionality that we wish to achieve in this project. Many of the backend components of this demo are simple baselines, so don't expect to get accurate results! But feel free to send us suggestions about the functionality, and definitely contact Diego if you wish to join this project and improve the system!
Resources
- Runs at BioASQ 11b [github]
- Runs at BioASQ 10 - Synergy [github]
- Runs at BioASQ 10b [github]
- Runs at BioASQ 9b [github]
- Runs at BioASQ 8 - Synergy [github]
- Runs at BioASQ 8b [docker] [github]
If you use this code, please cite:
D. Mollá, C. Jones, V. Nguyen Query Focused Multi-document Summarisation of Biomedical Texts: Macquarie Universiy and the Australian National University at BioASQ8b (2020). CLEF 2020 Working Notes. [arxiv] [docker image] [github] - Runs at BioASQ 7b [docker]
If you use this code, please cite:
D. Mollá, C. Jones. Classification Betters Regression in Query-based Multi-document Summarisation Techniques for Question Answering: Macquarie University at BioASQ7b (2019). Proc. BioASQ Workshop at ECML-PKDD 2019. [arxiv] [docker image] - Runs at BioASQ 6b [github]
If you use this code, please cite:
D. Mollá. Macquarie University at BioASQ 6b: Deep learning and deep reinforcement learning for query-based multi-document summarisation (2018). Proceedings of the 6th BioASQ Workshop A challenge on large-scale biomedical semantic indexing and question answering. [arxiv] [github] - Reinforcement learning [github]
If you use this code, please cite:
D. Mollá. Towards the Use of Deep Reinforcement Learning with Global Policy For Query-based Extractive Summarisation (2017). Proc. ALTA 2017 [arxiv] [code]. - Runs at BioASQ 5b [docker]
If you use this code, please cite:
D. Mollá. Macquarie University at BioASQ 5b -- Query-based Summarisation Techniques for Selecting the Ideal Answers (2017). Proc. BioNLP 2017. [arxiv] [docker image] - Corpus for EBM Query-based Summarisation [sourceforge]
If you use this corpus, please cite:
D. Mollá, M.E. Santiago-Martínez A. Sarker, C. Paris. A Corpus for Research in Text Processing for Evidence Based Medicine (2016). Language Resources and Evaluation, 50(4):705-727. DOI 10.1007/s10579-015-9327-2 - EBMSummariser [bitbucket]
If you use this code, please cite:
A. Sarker, D. Mollá, C. Paris. Query-Oriented Evidence Extraction to Support Evidence-based Medicine Practice (2016). Journal of Biomedical Informatics, 59:169-184. DOI 10.1016/j.jbi.2015.11.010 - Clinical
Evidence Detector [sourceforge]
If you use this code, please cite:
P. Davis-Desmond and Diego Mollá. Detection of Evidence in Clinical Research Papers (2012). Australasian Workshop On Health Informatics and Knowledge Management (HIKM 2012), Melbourne, Australia. [slides]
Members
- Diego Mollá (project director and chief investigator)
- Cécile Paris (partner investigator, CSIRO ICT Centre)
Past Members
The following people have done significant work for the project, leading to publication.
- Abeed Sarker (PhD student)
- Patrick Davis-Desmond (Masters student)
- Sara Faisal Shash (Masters student)
- Andreea Tutos (Masters student)
- Christopher Jones (research programmer, Master of Research student)
- Maria Elena Santiago-Martinez (research programmer)
- Hamed Hassanzadeh (research visitor)
- Urvashi Khanna (Master of Research student)
- Dima Galat (Master of Research student)
Join this Project
There are several ways you can join this project:
- As a PhD student: If you have an interesting topic of research that has to do with question answering and/or summarisation, send us an email and we will get in touch with you.
- As a Masters/Honours student: If you are a Masters or an Honours student in Macquarie University and you are searching for a project, consult the list of Honours projects. Many of these projects can be adapted to Masters projects.
- As an undergraduate student: If you are a student enrolled in Macquarie University you can also work for AnswerFinder and get paid for it. Consult the list of summer projects.
Related Links
- Related Honours Projects
- Related PhD Scholarships
- Other projects at the Centre for Language Technology
Publications up to 2020
For any publications from 2021, please visit my Macquarie University research page, or my Google Scholar page.- D. Mollá, C. Jones, V. Nguyen Query Focused Multi-document Summarisation of Biomedical Texts: Macquarie Universiy and the Australian National University at BioASQ8b (2020). CLEF 2020 Working Notes. [arxiv] [github] [docker image]
- D. Mollá, C. Jones. Classification Betters Regression in Query-based Multi-document Summarisation Techniques for Question Answering: Macquarie University at BioASQ7b (2019). Proc. BioASQ Workshop at ECML-PKDD 2019. [arxiv] [docker image]
- M. Kaur and D. Mollá. Supervised Machine Learning for Extractive Query Based Summarisation of Biomedical Data (2018). Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis (Louhi 2018). [arxiv]
- D. Mollá. Macquarie University at BioASQ 6b: Deep learning and deep reinforcement learning for query-based multi-document summarisation (2018). Proceedings of the 6th BioASQ Workshop A challenge on large-scale biomedical semantic indexing and question answering. [arxiv] [github]
- V. Nguyen, S. Karimi, S. Falamaki, D. Mollá, C. Paris, S. Wan. CSIRO at 2017 TREC Precision Medicine Track (2017). The Twenty-Sixth Text REtrieval Conference.
- D. Mollá. Towards the Use of Deep Reinforcement Learning with Global Policy For Query-based Extractive Summarisation (2017). Proc. ALTA 2017 [arxiv] [code].
- D. Mollá. Macquarie University at BioASQ 5b -- Query-based Summarisation Techniques for Selecting the Ideal Answers (2017). Proc. BioNLP 2017. [arxiv] [docker image]
- A. Sarker, D. Mollá, C. Paris Automated Text Summarisation and Evidence-Based Medicine: A Survey of Two Domains (2017). arXiv:1706.08162, 35 pages.
- P. Sahoo, A. Ekbal, S. Saha, D. Mollá, K. Nandan. Semi-supervised Clustering of Medical Text (2016). Proceedings of the COLING 2016 Workshop on Clinical Natural Language Processing (ClinicalNLP). Best paper award in "long paper" category.
- D. Mollá, M.E. Santiago-Martínez A. Sarker, C. Paris. A Corpus for Research in Text Processing for Evidence Based Medicine (2016). Language Resources and Evaluation, 50(4):705-727. DOI 10.1007/s10579-015-9327-2
- A. Sarker, D. Mollá, C. Paris. Query-Oriented Evidence Extraction to Support Evidence-based Medicine Practice (2016). Journal of Biomedical Informatics, 59:169-184. DOI 10.1016/j.jbi.2015.11.010
- M. Yousefi Azar, K. Sirts, L. Hamey, D. Mollá. Query-Based Single Document Summarization Using an Ensemble Noisy Auto-Encoder (2015). Proceedings of the Australasian Language Technology Association Workshop 2015, pp2-10
- H. Hassanzadeh, D. Mollá, T. Groza, A. Nguyen, J. Hunter. Similarity Metrics for Clustering PubMed Abstracts for Evidence Based Medicine (2015). Proceedings of the Australasian Language Technology Association Workshop 2015, pp48-56
- A. Sarker, D. Mollá, C. Paris. Automatic Evidence Quality Prediction to Support Evidence-based Decision Making (2015). Artificial Intelligence in Medicine, 64(2)89-103. PubMed ID 25983133. DOI 10.1016/j.artmed.2015.04.001
- D. Mollá, C. Jones, Abeed Sarker. Impact of Citing Papers for Summarisation of Clinical Documents (2014). Proceedings of the Australasian Language Technology Association Workshop 2014 (ALTA 2014), pp53-61, Brisbane, Australia. DOI 10.13140/2.1.2366.1126 [slideshare] [researchgate]
- D. Mollá, I. Amini, D. Martinez. Document Distance for the Automated Expansion of Relevance Judgements for Information Retrieval Evaluation (2014) Proceedings ACM SIGIR Workshop on Gathering Efficient Assessments of Relevance (GEAR), Gold Coast, Australia. [slideshare] [researchgate]
- D. Mollá, D. Martinez, and I. Amini. Towards Information Retrieval Evaluation with Reduced and Only Positive Judgements (2013). Proceedings of the 18th Australasian Document Computing Symposium (ADCS 2013), Brisbane, Australia. [poster]
- A. Ekbal, S. Saha, D. Mollá, and K. Ravikumar. Multi-Objective Optimization for Clustering of Medical Publications (2013). Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013), pp53-61, Brisbane, Australia. [slideshare]
- A. Sarker, D. Mollá, C. Paris. Automatic Prediction of Evidence-based Recommendations via Sentence-level Polarity Classification (2013). Proceedings 6th International Joint Conference on Natural Language Processing (IJCNLP2013), pp712-718, Nagoya, Japan. [slideshare]
- A. Sarker, D. Mollá, C. Paris. An Approach for Query-Focused Text Summarisation for Evidence Based Medicine (2013). Proceedings of the 14th Conference on Artificial Intelligence in Medicine (AIME 2013), Murcia, Spain. Lecture Notes in Computer Science, Vol. 7885, pp 295-304. [manuscript]
- S.F. Shash, D. Mollá. Clustering of Medical Publications for Evidence Based Medicine Summarisation (2013). Proceedings of the 14th Conference on Artificial Intelligence in Medicine (AIME 2013), Murcia, Spain. Lecture Notes in Computer Science, Vol. 7885, pp 305-309. [manuscript] [poster]
- A. Sarker, D. Mollá and C. Paris. An Approach for Automatic Multi-label Classification of Medical Sentences (2013). Proceedings of the Fourth International Workshop on Health Document Text Mining and Information Analysis (LOUHI 2013), Sydney, Australia.
- D. Mollá. Experiments with Clustering-based Features for Sentence Classification in Medical Publications: Macquarie Test's participation in the ALTA 2012 shared task. . Proceedings of the 2012 Australasian Language Technology Workshop (ALTA 2012), Dunedin, New Zealand.
- I. Amini, D. Martinez and D. Mollá. Overview of the ALTA 2012 Shared Task (2012). Proceedings of the 2012 Australasian Language Technology Workshop (ALTA 2012), Dunedin, New Zealand.
- A. Sarker, D. Mollá and C. Paris. Towards Two-step Multi-document Summarisation for Evidence Based Medicine: A Quantitative Analysis (2012). Proceedings of the 2012 Australasian Language Technology Workshop (ALTA 2012), Dunedin, New Zealand.
- D. Martinez, A. MacKinlay, D. Mollá, L. Cavedon and K. Verspoor. Simple similarity-based question answering strategies for biomedical text (2012). QA4MRE Workshop at Conference and Labs of the Evaluation Forum, CLEF 2012. September 17-20, 2012.
- D. Mollá and M.E. Santiago-Martínez. Creation of a Corpus for Evidence Based Medicine Summarisation (2012). Australasian Medical Journal, 5(9).
- A. Sarker, D. Mollá and C. Paris. Extractive Summarisation of Medical Documents using Domain Knowledge and Corpus Statistics (2012). Australasian Medical Journal.
- A. Sarker, D. Mollá and C. Paris. Extractive Evidence Based Medicine Summarisation Based on Sentence-Specific Statistics (2012). Proceedings of the 25th IEEE International Symposium on Computer-based Medical Systems (CBMS2012), Rome, Italy. [slides]
- P. Davis-Desmond and Diego Mollá. Detection of Evidence in Clinical Research Papers (2012). Australasian Workshop On Health Informatics and Knowledge Management (HIKM 2012), Melbourne, Australia. [slides]
- D. Mollá and María Elena Santiago-Martínez. Creation of a Corpus for Evidence Medicine Summarisation (2011). Proceedings of the First Australian Workshop on Artificial Intelligence in Health (AIH 2011), Perth, Australia. [poster]
- A. Sarker, D. Mollá and Cécile Paris. Extractive Summarisation of Medical Documents using Domain Knowledge and Corpus Statistics (2011). Proceedings of the First Australian Workshop on Artificial Intelligence in Health (AIH 2011), Perth, Australia.
- D. Mollá and A. Sarker. Automatic Grading of Evidence: The 2011 ALTA Shared Task (2011). Proceedings of the 2011 Australasian Language Technology Workshop (ALTA 2011), Canberra, Australia.
- A. Sarker, D. Mollá and Cécile Paris. Outcome Polarity Identification of Medical Papers (2011). Proceedings of the 2011 Australasian Language Technology Workshop (ALTA 2011), Canberra, Australia.
- D. Mollá and María Elena Santiago-Martínez. Development of a Corpus for Evidence Medicine Summarisation (2011). Proceedings of the 2011 Australasian Language Technology Workshop (ALTA 2011), Canberra, Australia. [slideshare]
- A. Sarker, D. Mollá and Cécile Paris. Towards Automatic Grading of Evidence (2011). Proceedings of the Third International Workshop on Health Document Text Mining and Information Analysis (LOUHI 2011), pp51-58. Bled, Slovenia.
- A. Sarker and D. Mollá. A Rule-based Approach for Automatic Identification of Publication Types of Medical Papers (2010). Proceedings ADCS 2010, 5 pages. Melbourne.
- D. Mollá. A Corpus for Evidence Based Medicine Summarisation (2010). Proceedings ALTA 2010, pp.76-80. Melbourne. [slides>]
- A. Tutos and D. Mollá. A Study on the Use of Search Engines for Question Answering in Biomedicine (2010). Australasian Workshop On Health Informatics and Knowledge Management (HIKM), 8 pages. Brisbane. [slides]
Other Presentations
- Macquarie University Workshop on Text Mining and Health, opening presentation at the Macquarie University Workshop on Text Mining and Health, Macquarie University, Sydney, September 2014.
- Text Summarisation for Evidence Based Medicine, presentation at the Indo-Australia Workshop on Optimization Techniques for Human Language Technology, India Institute of Technology Patna, December 16 2012.
- Automated Summarisation for Evidence Based Medicine, HAIL seminar 22 March 2012.
Funding
This research is partly funded by a Macquarie University Research Development Grant and by CSIRO.
Parts of this work are licensed under a
GNU General
Public License GPLv3.
Parts of this work are licensed under an
Apache License, Version 2.0.