Treatment learning

Timm:: software

Treatment learning

new | hot | fun | blog

Introduction

Introduction
Why that name?
Installation
Files
Invocation
Presentations
Papers
Tips
Memory
Author
Download

(NOTE: recent experiments suggest that TAR2 is out-performed by TAR3- also described below.)

TAR2 is a treatment learner, or more specificly, a data mining/summarization tool. Both treatment learners and other machine learners are rule discovery paradigms. However, classical machine learners like C4.5 aim at discovering classification rules: i.e. given a classified training set, they output rules that are predictive of the class attribute. TAR2 differs from those learners in that:

TAR2 assumes the classes are ordered by their scores(some domain-specific measure).
Highly scored classes are preferable to lower scored classes.
Further, one class is more desirable than all others, which is called the best class.
Rather than finding the classification rules, TAR2 finds rules that predict both increased frequency of the best class and decreased frequency of the worst class.

That is, TAR2 finds discriminate rules that drive the system away from the worst class to the best class.

TAR2 inputs classified data logs and output treatments. A treatment is one or a conjunction of attribute values. It is a constraint on future controllable inputs of the system. In summary, treatment learners give us controllers rather than classifiers. To understand the distinction, consider the case of someone reading a map. Classifiers say "you are here" on the map while controllers say "go this way".

You can find a detailed illustration of how TAR2 works in tar2intro.pdf

Why that name?

Introduction
Why that name?
Installation
Files
Invocation
Presentations
Papers
Tips
Memory
Author
Download

TAR2 is based on "TARZAN"- a post-processor to a decision tree learner that swung through the learnt trees looking for attribute ranges that culled the most number of branches to "bad" classes while preserving the most number of branches to "good" classes. TARZAN is described in Practical Large Scale What-if Queries: Case Studies with Software Risk Assessment

FYI, TAR2 is much faster, much simpler, and does not need the decision tree pre-processor.

Installation

Introduction
Why that name?
Installation
Files
Invocation
Presentations
Papers
Tips
Memory
Author
Download

Download the file tar2.zip shown at the bottom of this page.

Simply unzip tar2.zip and you get the following:

Source code of TAR2 and a X-way cross validation facility.
All DOS executables to run TAR2 and X-way cross validation experiment (we are told, but can't confirm, that rebuilding this for UNIX is just a matter of cd-ing to the source directory and just typing "make").
Sample datasets and their output files.
Documents including instructions and several associated research papers.

Files

Introduction
Why that name?
Installation
Files
Invocation
Presentations
Papers
Tips
Memory
Author
Download

The directory structure of the un-zipped TAR2 system is as follows:

README:

COPYRITE: includes the GPL-2 Copy policy
.\doc user instruction and pdf's
.\src source files for TAR2, xvalprep and xval
.\bin all executables
.\samples sample data sets and output files

Invocation

Introduction
Why that name?
Installation
Files
Invocation
Presentations
Papers
Tips
Memory
Author
Download

(Please read user instruction manual.doc in the TAR2 package before conducting any experiments.)

Go into tar2\bin and type: tar2 filestem tar2 filestem > filestem.out.
Example: tar2 c:\tar2\samples\iris\iris > iris.out
Also, you can make a batch file with that one line command and run TAR2 using your mouse.

Presentations

Introduction
Why that name?
Installation
Files
Invocation
Presentations
Papers
Tips
Memory
Author
Download

When can we ignore stuff?: presentation to NASA AMES, July 2003.

Papers

Introduction
Why that name?
Installation
Files
Invocation
Presentations
Papers
Tips
Memory
Author
Download

Treatment Learning: Implementation and Application: the 2003 masters thesis that generated TAR2/TAR3 (by Ying Hu).
Practical Large Scale What-if Queries: Case Studies with Software Risk Assessment: on the thing that preceded TAR2;
Condensing Uncertainty via Incremental Treatment Learning: a discussion of three detailed applications of TAR2;
Data Mining for Very Busy People: best general overivew;
Just Enough Learning (of Association Rules): The TAR2 "Treatment" Learner: most details on the internals of TAR2.

See also \doc in the download zip files.

Tips

Introduction
Why that name?
Installation
Files
Invocation
Presentations
Papers
Tips
Memory
Author
Download

Make sure each of the three files: filestem.data, filestem.names, filestem.cfg is of the correct format.
Filestem should not be "XDF" when running tar2.
First run: set all the parameters to default,TAR2 prints out the deltaf distribution of the dataset.
After first run: set promising to a non-zero value according to the deltaf distribution (generally, set promising to a larger deltaf value, it can be a decimal) and run TAR2 once more to get treatments.
Increase nchanges to see if the result is improving. Generally, nchanges is less than 4.
Use skew to control the size of result set: some subsets that satisfy certain treatments may be too small to be convincing. Set skew = N to report only subsets that contain at least 1/N cases with best class of the original set.

Memory

Introduction
Why that name?
Installation
Files
Invocation
Presentations
Papers
Tips
Memory
Author
Download

With Window98, TAR2 easily handles 350,000 examples (13 attributes) in 64M, but need more (suggest 196M) memory to handle more than, (say)550,000 examples(in 80sec).

Author

Introduction
Why that name?
Installation
Files
Invocation
Presentations
Papers
Tips
Memory
Author
Download

TARZAN (a.k.a. TAR1): Tim Menzies with help from Erik Sinsel
TAR2 (alpha):: Tim Menzies.
Awful Prolog prototype. Barely usable.
TAR2, TAR3: Ying Hu (with remote and contradictory advice from Tim Menzies).
Runs fast. Simple to use

Download

Introduction
Why that name?
Installation
Files
Invocation
Presentations
Papers
Tips
Memory
Author
Download

TAR2.2; start with .\dispatchTAR2\doc\TAR2intro.pdf.
TAR3; start with .\tar3\doc\TAR3manual.pdf;

bite::src

keyword: [TImM'sPaGES]