
README for inside-outside/

(c) Mark_Johnson@Brown.edu, 21st August 2000; 
(c) Mark_Johnson@Brown.edu, hacked 12th July 2004
(c) Mark_Johnson@Brown.edu, hacked again 2nd September 2007 
    to add Variational Bayes
(c) Mark.Johnson@MQ.edu.au, hacked again on 20th October 2012 
    to read the bias parameter (i.e., Dirichlet prior) first
(c) Mark.Johnson@MQ.edu.au, hacked again on 6th April 2013
    to fix bug in grammar.c::dump_grammar() found by Tetsuo Kiso

A simple "make" should suffice to build the program io.

For fastest performance, set the environment variable GCCFLAGS
for the particular architecture you will be running on.  If you're
using a reasonably modern gcc, you should be able to say

      export GCCFLAGS="-march=native"

or 

      setenv GCCFLAGS="-march=native"

and it will compile the program assuming it will be run on the
same machine it is being compiled on.

-------------------------------------------------------------------

This version adds a Dirchlet prior (in the form of pseudo-counts)
to the rule counts.

The main change is that the grammar file should consist of lines of
the form:

[Pseudocount [InitialWeight]] Parent --> Child1 ... Childn

If InitialWeight and Pseudocount are absent, InitialWeight defaults to
1 and Pseudocount defaults to a value specified on the command line
(default is zero, which is definitely *not* what you want if you're doing
Variational Bayes!).

If either Weight or Pseudocount are not present, then Parent should not
be anything that scanf might confuse with a number (i.e., scanf 
attempts to read two numbers from the start of each line).

The start category is the Parent of the first rule in the grammar.

WARNING: if you are doing Variational Bayes inference, then the Pseudocount
should be greater than zero!


-------------------------------------------------------------------

This is an implementation of the inside-outside algorithm for
estimating the probabilities of PCFG productions.  While there are
several places where better indexing might improve performance, I have
paid attention to ensuring that the algorithm runs in n^3 time and in
time linear in grammar size.  I have used this program with grammars
with tens of thousands of productions, trained using tens of thousands
of sentences (it may take a week).

This program may be freely used for research purposes; however,
I do request acknowledgement in any publications containing results
that were produced with this program or code derived from this
program.

This implementation assumes that all terminals are introduced
by non-recursive unary rules.  This lets me optimize these
rules (so it is possible to have a lot of them; in the tens of
thousands).

Usage: see the top of the io.c file, or run the program with the -h flag.
You can also get a good idea of how to run this by looking at the
Makefile examples (testvb and testio were the examples when I last
updated this file).


Compilation:
-----------

The program is written in ANSI C, and should compile just by
saying ``make'', to produce an executable ``io''.

Setting optimization flags makes a big difference, as the program
spends a lot of its time in tight loops.  I find I get a big speed
up by turning on as much optimization as possible.



Toy Examples:
------------

The file testengger.lt contains a sample initial grammar for
a tiny subset which generates both English and German verb phrase
word orders.  In addition, every preterminal (i.e., part of speech)
rewrites to every terminal (word) that appears in the training
data.

The file testeng.yld contains 5 very simple English sentences, and
the file testger.yld contains 5 very simple sentences with English
words but in German word order (e.g., ``a man the dog a bone gives'').


If you run 

	make testio

you get the following output:

 make testio
 io -d 1000 -g testengger.lt testeng.yld 
 # rule_bias_default (-a) = 0, stoptol (-s) = 1e-07, minruleprob (-p) = 1e-20, maxlen (-l) = 0, minits (-m) = 0, maxits = (-n) = 0, annealstart (-b) = 1, annealstop (-B) = 1, nanneal (-N) = 0, jitter0 (-j) = 0, jitter (-J) = 0, VariationalBayes (-V) = 0, wordscale (-W) = 1, randseed (-S) = 97,debuglevel (-d) = 1000, nruns (-R) = 1
 # Run 0
 # Iteration	temperature	nrules	-logP	bits/token
 0	1	29	61.0128	3.03527
 1	1	23	42.6594	2.12222
 2	1	23	30.227	1.50374
 3	1	23	27.8137	1.38368
 4	1	18	27.8111	1.38355
 5	1	12	27.8111	1.38355
 6	1	12	27.8111	1.38355
 # run 0, entropy 1.38355, 12 rules
 1	S1 --> S
 1	S --> NP VP
 1	NP --> Det N
 0.6	VP --> V NP
 0.4	VP --> V NP NP
 0.416667	Det --> the
 0.583333	Det --> a
 0.416667	N --> dog
 0.333333	N --> man
 0.25	N --> bone
 0.6	V --> bites
 0.4	V --> gives


 Compilation finished at Sun Sep  2 13:05:28

Rules with probability less than 10^-7 are silently ignored.
You can see that the algorithm has identified the phrase structure
rules for English syntax, and has also correctly identified the
parts of speech for the words.

