README for oi  -- odds ratio lower bound estimator

(c) Mark Johnson, 5th September 2000
    Brown University
    Email: Mark_Johnson@Brown.edu
    Web: http://www.cog.brown.edu/~mj

The program(s) distributed with this file are made available freely
for research purposes only.  Please contact me if you are interested
in commercial application.  I request acknowledgement if results from
this program or a program derived from this code appears in a
publication.

                                NO WARRANTY

BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT
WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER
PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND,
EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE
PROGRAM IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME
THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF
THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO
LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY
OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED
OF THE POSSIBILITY OF SUCH DAMAGES.


Compilation:
===========

The programs here are written in ANSI C, and should compile with any
standard C compiler.  It should only be necessary to run "make".  The
code is written in such a way that it should benefit from optimization,
especially in-lining.  Since compilers differ in the flags they require
for such optimization, it is up to you to set the environment variable
CFLAGS appropriately.


Usage:
=====

	oi alpha

Alpha, the significance level, should be a real number greater than
zero and less than one.  A typical value is 0.05.

The program reads lines from standard input and writes an equal number
of lines to standard output.  Each input line should begin with a two
ratios of positive integers r1/n1 and r2/n2.  These are interpreted as
samples of two different distributions.  It must be the case that
n1>=r1>=0, n1>=1, n2>=r2>=0 and n2>=0.  The program copies each input
line to the output, prepending a real number p to the line, which is
the lower limit of an alpha-level confidence interval estimate of the
conditional odds ratio for the 2x2 table

	( r1    ,    n1-r1  )
	( r2    ,    n2-r2  )

A conditional estimator for the odds ratio is one that estimates the
odds ratio using the distribution of r1 conditioned on the row and
column totals.  Conditional estimators may not be appropriate in some
situations.  The issue is complicated: please see textbooks on
Categorical Data Analysis (under Exact Statistics) for information.

This procedure is computationally intensive; the effort grows linearly
with n1+n2.


Example:
=======

Suppose the input file sample.dat contains:

	5/10 16/100 sample1 2
	25/100 160/1000 sample3 4

Then running

	oi 0.1 < sample.dat

produces as output

	1.323577         5/10 16/100      sample1 2
	1.121612         25/100 160/1000  sample3 4

The prefixed number is the 0.1 confidence level lower bound on the
odds ratios of the two distributions whose samples are represented on
each line.  If you interpret the odds ratio as a measure of how
different the two distributions are, this number is a lower bound on
the measure of how different these distributions are.

As alpha becomes smaller, the confidence bounds become wider and the
measure becomes more conservative.  Thus

	oi 0.01 < sample.dat

produces as output

	0.662043         5/10 16/100 sample1 2
	0.874781         25/100 160/1000 sample3 4

The Unix sort command can be used to sort the output of the oi program
so that the highest scoring samples come first.

	oi 0.01 < sample.dat | sort -k1,1 -nr

produces

	0.874781         25/100 160/1000  sample3 4
	0.662043         5/10 16/100      sample1 2
