The tutorial will be on Minimum Length Encoding, encompassing both Minimum Message Length (MML) and Minimum Description Length (MDL) inductive inference. This work is information-theoretic in nature, with a broad range of applications in machine learning, statistics, "knowledge discovery" and "data mining".
We discuss the following topics: statistical parameter estimation; mixture modelling (or clustering) of continuous, discrete and circular data; clustering with correlated attributes; learning decision trees; learning decision trees with Markov model leaf regressions; learning probabilistic finite state machines; and possibly other problems if time permits.
MML is statistically consistent and efficient, meaning that it converges as quickly as is possible to any true underlying data-generating process. It is also invariant under 1-to-1 re-parameterisation of the problem and has a better than good track record in problems of machine learning, statistics and ``data mining''.
We will also show the successes of MML compared to other methods both in fitting polynomial functions and in modelling and fitting an alternating binary process.
Some of the above machine learning techniques will then be applied to real-world problems, such as protein structure prediction and the human genome project, lossless image compression, exploration geology, business forecasting, market inefficiency and natural language. Passing mention will be made of foundational issues such as connections to Kolmogorov-Solomonoff-Chaitin complexity (see recent special issue of the Computer Journal), universal modelling and (probabilistic) prediction.
There will be some flexibility in the depth and scope of material presented depending upon the preferences of the attendees, and attendees are very welcome and encouraged to contact the presenter in advance if there are any particular topics that they would like to see covered. The time allocated for the tutorial is 3 hours, but the presenter is willing to go overtime if sufficient interest warrants this.
Dr David Dowe works primarily with Lloyd Allison, Trevor Dix, Chris Wallace and others in the Minimum Message Length (MML) group at the School of Computer Science and Software Engineering at Monash University. Most of his work for the past 8 1/2 year has been in the theory and applications of the (information-theoretic) MML principle of statistical and inductive inference and machine learning (and "knowledge discovery" and "data mining"), a principle which dates back to Wallace and Boulton (Comp. J., 1968), and which has been surveyed more recently in Wallace and Freeman (J. Roy. Stat. Soc., 1987) and Wallace and Dowe (Comp. J., 1999). David was Program Chair of the Information, Statistics and Induction in Science (ISIS) conference, held in Melbourne, Australia on 20-23 August 1996; attended by R. J. Solomonoff, C. S. Wallace, J. J. Rissanen, J. R. Quinlan, Marvin Minsky, and others.
Chris Wallace and David Dowe are authors of the Snob program for unsupervised clustering and mixture modelling. Snob does Minimum Message Length (MML) mixture modelling of Gaussian, discrete multi-state (Bernoulli or categorical), Poisson and von Mises circular distributions. The Snob software is available (subject to conditions) for private, academic use.
David's earlier tutorial presentations of MML include: