About Me

I'm a Research Fellow working on applying Natural Language Processing (NLP) and machine learning methods within the medical domain. I also continue to collaborate with Macquarie University in the Department of Computing.

Previously, I completed my PhD at Macquarie University's Centre for Language Technology under the supervision of Associate Prof Mark Dras and Prof Mark Johnson.

My PhD research focused primarily on Native Language Identification (NLI), the task of identifying an author's native language (L1) using only their writings in a second language (L2). Here I investigated how NLP and Machine Learning can be used to aid linguistic research in Second Language Acquisition and detecting language transfer effects. This is tackled using large-scale learner data to identify common error types or omissions made by non-native speakers and idiosyncratic constructions associated with particular linguistic backgrounds. NLI can also be used for Authorship Profiling and Identification in the context of Forensic Linguistics. It can provide evidence about a writer's linguistic background in situations where a text, like an anonymous letter, is the key piece of evidence in an investigation.

More broadly, my research interests include machine learning, NLP, data mining, information extraction, forensic linguistics, language learning, automated language assessment, language transfer, language identification and computer vision.

Recent Activity

Publications

2018

  • Shervin Malmasi and Mark Dras (2018) Native Language Identification With Classifier Stacking and Ensembles. Computational Linguistics 44.3, pp. 403–446. [bib]
  • Shervin Malmasi and Macos Zampieri (2018) Challenges in Discriminating Profanity from Hate Speech. Journal of Experimental & Theoretical Artificial Intelligence, 30:2, pp. 187-202. [bib]
  • Ritesh Kumar, Atul Ojha, Shervin Malmasi and Macos Zampieri (2018) Benchmarking Aggression Identification in Social Media. Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), pp. 1-11. [bib]
  • Shervin Malmasi, Iria del Río and Marcos Zampieri (2018) Portuguese Native Language Identification. Proceedings of International Conference on the Computational Processing of Portuguese (PROPOR). [bib]
  • Iria del Río, Marcos Zampieri, and Shervin Malmasi (2018) A Portuguese Native Language Identification Dataset. Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications (BEA-13), pp. 291-296. New Orleans, Louisiana, USA. [bib]
  • Fernando Benites, Shervin Malmasi and Marcos Zampieri (2018) Classifying Patent Applications with Ensemble Methods. Proceedings of the Australasian Language Technology Workshop (ALTA). Dunedin, New Zealand. [bib] [poster]
  • Seid Muhie Yimam, Chris Biemann, Shervin Malmasi, Gustavo H. Paetzold, Lucia Specia, Sanja Štajner, Anaïs Tack, Marcos Zampieri (2018) A Report on the Complex Word Identification Shared Task 2018. Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications (BEA-13), pp. 66-78. New Orleans, Louisiana, USA. [bib]
  • Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Ahmed Ali, Suwon Shon, James Glass, Yves Scherrer, Tanja Samardžić, Nikola Ljubešić, Jörg Tiedemann, Chris van der Lee, Stefan Grondelaers, Nelleke Oostdijk, Antal van den Bosch, Ritesh Kumar, Bornini Lahiri, Mayank Jain (2018) Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign. Proceedings of the 5th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial), pp. 1-17. Santa Fe, New Mexico, USA. [bib]
  • Alina Maria Ciobanu, Shervin Malmasi, and Liviu P. Dinu (2018) German Dialect Identification Using Classifier Ensembles. Proceedings of the 5th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial), pp. 288-294. Santa Fe, New Mexico, USA. [bib]
  • Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi, Santanu Pal, and Liviu P. Dinu (2018) Discriminating between Indo-Aryan Languages Using SVM Ensembles. Proceedings of the 5th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial), pp. 178-184. Santa Fe, New Mexico, USA. [bib]

2017

  • N. Hosomura, S. Malmasi, D. Timerman, V.J. Lei, H. Zhang, L. Chang and A. Turchin (2017) Decline of Insulin Therapy by People with Uncontrolled Diabetes Mellitus and Delays in Insulin Initiation. Diabetic Medicine. DOI: 10.1111/dme.13454. [bib] [abstract]
  • Shervin Malmasi and Mark Dras (2017) Feature Hashing for Language and Dialect Identification. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), 399-403. Vancouver, Canada. [bib]
  • Shervin Malmasi, Mark Dras, Mark Johnson, Lan Du and Magdalena Wolska (2017) Unsupervised Text Segmentation Based on Native Language Characteristics. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), 1457-1469. Vancouver, Canada. [bib]
  • Shervin Malmasi and Mark Dras (2017) Native Language Identification using Stacked Generalization. arXiv preprint arXiv:1703.06541. [bib]
  • Shervin Malmasi, N. L. Sandor, N. Hosomura, M. Goldberg, S. Skentzos and A. Turchin (2017) Canary: An NLP Platform for Clinicians and Researchers. Applied Clinical Informatics 08(02). pp. 447-453. [bib]
  • Shervin Malmasi, Naoshi Hosomura, Lee-Shing Chan, C. Justin Brown, Stephen Skentzos and Alexander Turchin (2017) Extracting Healthcare Quality Information from Unstructured Data. American Medical Informatics Association Annual Symposium Proceedings (AMIA 2017), 1243-1252. Washington, DC, USA. [bib]
  • Shervin Malmasi, Keelan Evanini, Aoife Cahill, Joel Tetreault, Robert Pugh, Christopher Hamill, Diane Napolitano, and Yao Qian (2017) A Report on the 2017 Native Language Identification Shared Task. Proceedings of the 12th Workshop on Building Educational Applications Using NLP, 62-75. Copenhagen, Denmark. [bib]
  • Shervin Malmasi and Marcos Zampieri (2017) Detecting Hate Speech in Social Media. Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP 2017), 467-472. Varna, Bulgaria. [bib]
  • Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi and Liviu P. Dinu (2017) Including Dialects And Language Varieties In Author Profiling. CLEF 2017 Working Notes. Dublin, Ireland. [bib]
  • N. Hosomura, S. Malmasi, D. Timerman, V. Lei, H. Zhang, L. Chang and A. Turchin (2017) Insulin Decline In Patients With Uncontrolled Diabetes Mellitus. The 77th American Diabetes Association Scientific Sessions.
  • Marcos Zampieri, S. Malmasi, N. Ljubešic, P. Nakov, A. Ali, J. Tiedemann, Y. Scherrer and N. Aepli (2017) Findings of the VarDial Evaluation Campaign 2017. Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial), 1-15. Valencia, Spain. [bib]
  • Marcos Zampieri, Shervin Malmasi, Gustavo Paetzold and Lucia Specia (2017) Complex Word Identification: Challenges in Data Annotation and System Performance. Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017), 59-63. Taipei, Taiwan. [bib]
  • Octavia-Maria Sulea, Marcos Zampieri, Shervin Malmasi, Mihaela Vela, Liviu P. Dinu, Josef van Genabith (2017) Exploring the Use of Text Classification in the Legal Domain. Proceedings of 2nd Workshop on Automated Semantic Analysis of Information in Legal Texts (ASAIL). London, United Kingdom. [bib]
  • Shervin Malmasi and Marcos Zampieri (2017) German Dialect Identification in Interview Transcriptions. Proceedings of the Fourth Workshop on Language Technology for Closely Related Languages, Varieties and Dialects (VarDial), 164-169. Valencia, Spain. [bib]
  • Shervin Malmasi and Marcos Zampieri (2017) Arabic Dialect Identification Using iVectors and ASR Transcripts. Proceedings of the Fourth Workshop on Language Technology for Closely Related Languages, Varieties and Dialects (VarDial), 178-183. Valencia, Spain. [bib]

2016

  • Shervin Malmasi, M. Zampieri, N. Ljubešic, P. Nakov, A. Ali, and J. Tiedemann (2016) Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task. Proceedings of the 3rd Workshop on Language Technology for Closely Related Languages, Varieties and Dialects (VarDial), 1-14. Osaka, Japan. [bib]
  • Shervin Malmasi (2016) Subdialectal Differences in Sorani Kurdish. Proceedings of the 3rd Workshop on Language Technology for Closely Related Languages, Varieties and Dialects (VarDial), 89-96. Osaka, Japan. [bib] [Sorani corpus]
  • Shervin Malmasi and Marcos Zampieri (2016) Arabic Dialect Identification in Speech Transcripts. Proceedings of the 3rd Workshop on Language Technology for Closely Related Languages, Varieties and Dialects (VarDial), 106-113. Osaka, Japan. [bib]
  • Marcos Zampieri, Shervin Malmasi, Octavia-Maria Sulea and Liviu P. Dinu (2016) A Computational Approach to the Study of Portuguese Newspapers Published in Macau. Proceedings of the Workshop on Natural Language Processing Meets Journalism (NLPMJ), 47-51. New York, NY, USA. [bib]
  • Shervin Malmasi, Mark Dras and Marcos Zampieri (2016) Predicting Post Severity in Mental Health Forums. Proceedings of the Third Computational Linguistics and Clinical Psychology Workshop (CLPsych) , 133-137. San Diego, California, USA. [bib]
  • Shervin Malmasi, Mark Dras and Marcos Zampieri (2016) LTG at SemEval-2016 Task 11: Complex Word Identification with Classifier Ensembles. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016). [bib]
  • Shervin Malmasi and Marcos Zampieri (2016) MAZA at SemEval-2016 Task 11: Detecting Lexical Complexity Using a Decision Stump Meta-Classifier. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016). [bib]
  • Marcos Zampieri, Shervin Malmasi and Mark Dras (2016) Modeling Language Change in Historical Corpora: The Case of Portuguese. Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016). [bib] [preprint version]
  • Cyril Goutte, Serge Léger, Shervin Malmasi and Marcos Zampieri (2016) Discriminating Similar Languages: Evaluations and Explorations. Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016). [bib]

2015

  • Shervin Malmasi and Mark Dras (2015) Multilingual Native Language Identification. Natural Language Engineering, 1-53. [bib] [abstract]
  • Shervin Malmasi and Mark Dras (2015) Large-scale Native Language Identification with Cross-Corpus Evaluation. Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2015), 1403-1409. Denver, CO, USA. [bib]
  • Shervin Malmasi, Hamed Hassanzadeh and Mark Dras (2015) Clinical Information Extraction using Word Representations. Proceedings of the Australasian Language Technology Workshop (ALTA), 66-74. Sydney, Australia. [bib] [word clusters]
  • Shervin Malmasi and Mark Dras (2015) Cognate Identification using Machine Translation. Proceedings of the Australasian Language Technology Workshop (ALTA), 138-141. Sydney, Australia. [bib]
  • Shervin Malmasi, Mark Dras and Irina Temnikova (2015) Norwegian Native Language Identification. Proceedings of Recent Advances in Natural Language Processing (RANLP 2015), 404-412. Hissar, Bulgaria. [bib]
  • Shervin Malmasi and Mark Dras (2015) Language Identification using Classifier Ensembles. Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects (LT4VarDial 2015), 35-43. Hissar, Bulgaria. [bib]
  • Shervin Malmasi and Aoife Cahill (2015) Measuring Feature Diversity in Native Language Identification. Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, 49-55. Denver, CO, USA. [bib]
  • Shervin Malmasi, Joel Tetreault and Mark Dras (2015) Oracle and Human Baselines for Native Language Identification. Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, 172-178. Denver, CO, USA. [bib]
  • Maolin Wang, Shervin Malmasi and Mingxuan Huang (2015) The Jinan Chinese Learner Corpus. Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, 118-123. Denver, CO, USA. [bib]
  • Shervin Malmasi and Mark Dras (2015) Automatic Language Identification for Persian and Dari texts. Proceedings of the 14th Conference of the Pacific Association for Computational Linguistics (PACLING 2015), 59-64. Bali, Indonesia. [bib]
  • Shervin Malmasi and Mark Dras (2015) Location Mention Detection in Tweets and Microblogs. Proceedings of the 14th Conference of the Pacific Association for Computational Linguistics (PACLING 2015), 94-99. Bali, Indonesia. [bib]
  • Shervin Malmasi, Eshrag Refaee and Mark Dras (2015) Arabic Dialect Identification using a Parallel Multidialectal Corpus. Proceedings of the 14th Conference of the Pacific Association for Computational Linguistics (PACLING 2015), 209-217. Bali, Indonesia. [bib]
  • Shervin Malmasi (2015) Discriminating Similar Languages: Persian and Dari. Tiny Transactions on Computer Science (TinyToCS) Volume 3. [bib]

2014

  • Shervin Malmasi and Mark Dras (2014) Chinese Native Language Identification. Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL-14), 95-99. Gothenburg, Sweden. [bib]
  • Shervin Malmasi and Mark Dras (2014) Language Transfer Hypotheses with Linear SVM Weights. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1385-1390. Doha, Qatar. [bib]
  • Shervin Malmasi and Mark Dras (2014) Arabic Native Language Identification. Proceedings of the Arabic Natural Language Processing Workshop, 180-186. Doha, Qatar. [bib]
  • Shervin Malmasi and Mark Dras (2014) From Visualisation to Hypothesis Construction for Second Language Acquisition. Proceedings of TextGraphs-9: the Workshop on Graph-based Methods for Natural Language Processing, 56-64. Doha, Qatar. [bib]
  • Shervin Malmasi and Mark Dras (2014) Finnish Native Language Identification. Proceedings of the Australasian Language Technology Workshop (ALTA), 139-144. Melbourne, Australia. [bib]
  • Shervin Malmasi and Mark Dras (2014) A Data-driven Approach to Studying Given Names and their Gender and Ethnicity Associations. Proceedings of the Australasian Language Technology Workshop (ALTA), 145-149. Melbourne, Australia. [bib]

2013

  • Shervin Malmasi, Sze-Meng Jojo Wong and Mark Dras (2013) NLI Shared Task 2013: MQ Submission. Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications (BEA-8), 124-133. Atlanta, Georgia, USA. [bib]

Other