Timm:: software
Treatment learning

new | hot | fun | blog
Introduction


Introduction
Why that name?
Installation
Files
Invocation
Presentations
Papers
Tips
Memory
Author
Download
(NOTE: recent experiments suggest that TAR2 is out-performed by TAR3- also described below.)

TAR2 is a treatment learner, or more specificly, a data mining/summarization tool. Both treatment learners and other machine learners are rule discovery paradigms. However, classical machine learners like C4.5 aim at discovering classification rules: i.e. given a classified training set, they output rules that are predictive of the class attribute. TAR2 differs from those learners in that:

  • TAR2 assumes the classes are ordered by their scores(some domain-specific measure).
  • Highly scored classes are preferable to lower scored classes.
  • Further, one class is more desirable than all others, which is called the best class.
  • Rather than finding the classification rules, TAR2 finds rules that predict both increased frequency of the best class and decreased frequency of the worst class.
That is, TAR2 finds discriminate rules that drive the system away from the worst class to the best class.

TAR2 inputs classified data logs and output treatments. A treatment is one or a conjunction of attribute values. It is a constraint on future controllable inputs of the system. In summary, treatment learners give us controllers rather than classifiers. To understand the distinction, consider the case of someone reading a map. Classifiers say "you are here" on the map while controllers say "go this way".

You can find a detailed illustration of how TAR2 works in tar2intro.pdf

Why that name?


Introduction
Why that name?
Installation
Files
Invocation
Presentations
Papers
Tips
Memory
Author
Download
TAR2 is based on "TARZAN"- a post-processor to a decision tree learner that swung through the learnt trees looking for attribute ranges that culled the most number of branches to "bad" classes while preserving the most number of branches to "good" classes. TARZAN is described in
Practical Large Scale What-if Queries: Case Studies with Software Risk Assessment

FYI, TAR2 is much faster, much simpler, and does not need the decision tree pre-processor.

Installation


Introduction
Why that name?
Installation
Files
Invocation
Presentations
Papers
Tips
Memory
Author
Download
Download the file tar2.zip shown at the bottom of this page.

Simply unzip tar2.zip and you get the following:

  • Source code of TAR2 and a X-way cross validation facility.
  • All DOS executables to run TAR2 and X-way cross validation experiment (we are told, but can't confirm, that rebuilding this for UNIX is just a matter of cd-ing to the source directory and just typing "make").
  • Sample datasets and their output files.
  • Documents including instructions and several associated research papers.
Files


Introduction
Why that name?
Installation
Files
Invocation
Presentations
Papers
Tips
Memory
Author
Download
The directory structure of the un-zipped TAR2 system is as follows:
Invocation


Introduction
Why that name?
Installation
Files
Invocation
Presentations
Papers
Tips
Memory
Author
Download
(Please read user instruction manual.doc in the TAR2 package before conducting any experiments.)

  • Go into tar2\bin and type: tar2 filestem tar2 filestem > filestem.out.
    Example: tar2 c:\tar2\samples\iris\iris > iris.out
  • Also, you can make a batch file with that one line command and run TAR2 using your mouse.
Presentations


Introduction
Why that name?
Installation
Files
Invocation
Presentations
Papers
Tips
Memory
Author
Download
When can we ignore stuff?: presentation to NASA AMES, July 2003.
Papers


Introduction
Why that name?
Installation
Files
Invocation
Presentations
Papers
Tips
Memory
Author
Download

See also \doc in the download zip files.

Tips


Introduction
Why that name?
Installation
Files
Invocation
Presentations
Papers
Tips
Memory
Author
Download
  • Make sure each of the three files: filestem.data, filestem.names, filestem.cfg is of the correct format.
  • Filestem should not be "XDF" when running tar2.
  • First run: set all the parameters to default,TAR2 prints out the deltaf distribution of the dataset.
  • After first run: set promising to a non-zero value according to the deltaf distribution (generally, set promising to a larger deltaf value, it can be a decimal) and run TAR2 once more to get treatments.
  • Increase nchanges to see if the result is improving. Generally, nchanges is less than 4.
  • Use skew to control the size of result set: some subsets that satisfy certain treatments may be too small to be convincing. Set skew = N to report only subsets that contain at least 1/N cases with best class of the original set.
Memory


Introduction
Why that name?
Installation
Files
Invocation
Presentations
Papers
Tips
Memory
Author
Download
With Window98, TAR2 easily handles 350,000 examples (13 attributes) in 64M, but need more (suggest 196M) memory to handle more than, (say)550,000 examples(in 80sec).
Author


Introduction
Why that name?
Installation
Files
Invocation
Presentations
Papers
Tips
Memory
Author
Download
TARZAN (a.k.a. TAR1)
Tim Menzies with help from Erik Sinsel
TAR2 (alpha):
Tim Menzies.
Awful Prolog prototype. Barely usable.
TAR2, TAR3
Ying Hu (with remote and contradictory advice from Tim Menzies).
Runs fast. Simple to use
Download


Introduction
Why that name?
Installation
Files
Invocation
Presentations
Papers
Tips
Memory
Author
Download
  • TAR2.2; start with .\dispatchTAR2\doc\TAR2intro.pdf.
  • TAR3; start with .\tar3\doc\TAR3manual.pdf;
  See who's visiting this page. bite::src ©2003::legal 
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


keyword: [TImM'sPaGES]