Diego Mollá Aliod
List of Undergraduate and Postgraduate Projects
Below is a list of possible projects for Honours and Masters students at Macquarie University. Please contact me for further details. Also, if you have a project in mind which is not listed here but that is related to my research interests, contact me and chances are that I will be interested too.
If you are pursuing a PhD, some of these projects can be extended to PhD projects.
Natural Language Processing for Medical Text
- Evidence Grading versus Bibliometrics
- Cluster Medical Research Papers (Masters, Honours)
- Extract Population Sizes (Masters)
- Classification of Medical Questions (Honours)
- Normalisation of Full Text (Masters)
Evidence Grading versus Google PageRank and Bibliometrics
(Masters or Honours project)
We have a corpus of questions and answer documents. The answer documents are grouped into related documents, and the group is annotated with an index indicating the quality of the combined answer according to criteria of Evidence Based Medicine. The task is to test whether approaches typically used to rank documents can be used to approximate the evidence grading. In particular, you will study Google's PageRank and other criteria used in search engines, and bibliometrics and other criteria used to determine the quality of research publications.
Cluster Clinical Research Papers
(Masters or Honours project)
We have a corpus of questions and answers. Each answer has a list of references arranged in groups according to how they address the answer. For example, if the question is "what is the best treatment for X", then the answer will have a group of references for each major treatment of X. The goal of this project is to automatically infer the clusters of the references associated with a question. A secondary goal is to label the clusters by determining what are the key terms characterising the clusters.
Extract Population Sizes
(Masters project)
An important characteristic that determines the quality of the evidence provided in a clinical study is the size of the population sampled in the experiments. The goal of this project is to determine the size given the text of a clinical study. The task involves determining whether a text describes experiments and the size of the population, and then extract the information about the size and analyse the resulting expressions.
Classification of Medical Questions
(Honours project)
We have a collection of medical questions and their answers. The goal of this project is to cluster the questions according to the structure of the answers, and compare the resulting question taxonomy with current question taxonomies. The project involves the application of unsupervised machine learning (clustering, topic modelling) and the analysis of the results.
Normalisation of Full Text
(Masters project)
We have a collection of medical questions and their answers, and pointers to the full text of the papers. However, the format of these papers depends on the publisher. The goal of this project is to extract the full text and normalise as much of it as possible. You will first determine the target format, which will likely be based on the Text Encoding Initiative or PubMed Central. Then you will identify types of publications that have format which is easy to analyse and then move on to more difficult formats. Most of the work will be done on HTML and XML text.
Natural Language Processing
- Frameworks for MapReduce Computations (Masters)
Frameworks for MapReduce Computations
(Masters analysis project)
MapReduce is a software framework introduced by Google to support distributed computing on large data sets on clusters of computers but there are more, such as Apache's Hadoop. The goal of this project is to test and analyse the current frameworks for tasks of text processing in distributed computing.

