 | Introduction |
(NOTE: recent experiments suggest that TAR2 is out-performed by TAR3- also
described below.)
TAR2 is a treatment learner, or more specificly, a data
mining/summarization tool. Both treatment learners and other
machine learners are rule discovery paradigms. However,
classical machine learners like C4.5 aim at discovering
classification rules: i.e. given a classified training set,
they output rules that are predictive of the class
attribute. TAR2 differs from those learners in that:
-
TAR2 assumes the classes are ordered by their scores(some
domain-specific measure).
- Highly scored classes are preferable to
lower scored classes.
- Further, one class is more desirable than all
others, which is called the best class.
- Rather than finding the
classification rules, TAR2 finds rules that predict both increased
frequency of the best class and decreased frequency of the worst
class.
That is, TAR2 finds discriminate rules that drive the system
away from the worst class to the best class.
TAR2 inputs classified data logs and output treatments. A
treatment is one or a conjunction of attribute values. It is a
constraint on future controllable inputs of the system. In
summary, treatment learners give us controllers rather than
classifiers. To understand the distinction, consider the case
of someone reading a map. Classifiers say "you are here" on
the map while controllers say "go this way". You can
find a detailed illustration of how TAR2 works in
tar2intro.pdf
|
 | Why that name? |
TAR2 is based on "TARZAN"- a post-processor to a decision tree
learner that swung through the learnt trees looking for attribute
ranges that culled the most number of branches to "bad" classes
while preserving the most number of branches to "good" classes.
TARZAN is described in Practical Large Scale What-if Queries:
Case Studies with Software Risk AssessmentFYI, TAR2 is much faster, much simpler, and does not need the decision
tree pre-processor.
|
 | Installation |
Download the file tar2.zip shown at the bottom of this page.
Simply unzip tar2.zip and you get the following:
- Source code of TAR2 and a X-way cross validation facility.
- All DOS executables to run TAR2 and X-way cross validation
experiment (we are told, but can't confirm, that rebuilding
this for UNIX is just a matter of cd-ing to the source
directory and just typing "make").
-
Sample datasets and their output files.
- Documents including instructions and several associated research
papers.
|
 | Files |
The directory structure of the un-zipped TAR2 system is as follows:
-
README:
-
COPYRITE: includes the GPL-2 Copy
policy
- .\doc user instruction and pdf's
- .\src
source files for TAR2, xvalprep and xval
- .\bin
all executables
- .\samples sample data sets and
output files
|
 | Invocation |
(Please read user
instruction manual.doc in the TAR2 package before conducting
any experiments.)
- Go into tar2\bin and type: tar2 filestem
tar2 filestem >
filestem.out.
Example: tar2 c:\tar2\samples\iris\iris > iris.out
-
Also, you can make a batch file with that one line command and
run TAR2 using your mouse.
|
 | Presentations |
When can we ignore stuff?: presentation to NASA AMES, July 2003.
|
 | Papers |
See also \doc in the download zip files. |
 | Tips |
-
Make sure each of the three files:
filestem.data, filestem.names, filestem.cfg
is of the correct format.
-
Filestem should not be "XDF" when running tar2.
-
First run: set all the parameters to default,TAR2
prints out the deltaf distribution of the dataset.
-
After first run: set promising to a non-zero
value according to the deltaf distribution (generally, set
promising to a larger deltaf value, it can be a decimal) and
run TAR2 once more to get treatments.
- Increase nchanges to see if the result is
improving. Generally, nchanges is less than 4.
-
Use skew to control the size of result set: some subsets that satisfy
certain treatments may be too small to be convincing.
Set skew = N to report only subsets that contain at least 1/N
cases with best class of the original set.
|
 | Memory |
With Window98, TAR2 easily handles 350,000 examples (13
attributes) in 64M, but need more (suggest 196M) memory to
handle more than, (say)550,000 examples(in 80sec). |
 | Author |
- TARZAN (a.k.a. TAR1)
- Tim Menzies with help from
Erik Sinsel
- TAR2 (alpha):
- Tim Menzies.
Awful Prolog
prototype. Barely usable.-
- TAR2, TAR3
- Ying Hu (with remote and contradictory advice from
Tim Menzies).
Runs fast. Simple to use
|
 | Download |
- TAR2.2;
start with .\dispatchTAR2\doc\TAR2intro.pdf.
- TAR3;
start with .\tar3\doc\TAR3manual.pdf;
|