DATE: Sun, 30 Mar 2003 07:20:30 -0800 (PST)
FROM: "Tim Menzies" <tim@menzies.us>
SUBJECT: something sooooo obvious (in retrospect)
TO: "kareem" <kammar@csee.wvu.edu>
CC: <justin@lostportal.net>, <huying_ca@yahoo.com>,
<drobo75@hotmail.com>, <lisa@nard.net>, <echiang@interchange.ubc.ca>
kareem,
your as yet unfinished IQ algorithm is the core of treatment learning.
no, that's not strong enough. IQ subsumes treatment
learning. treatment learning is just some parameter setting to IQ
see, we score each treatment by its "worth" (change in class
distribution over the baseline), its "support", its "stability" under
N-way cross-val, and some user "utility" score on the attributes
(e.g. i'd prefer to form treatments from these attributes, not those,
since they are cheaper to control)
at gen i (i>0), we seek combinations of treatments that take us to
higher worth, higher stability, higher support, higher utility
at gen0, we intialize the treatments via confidence1 and all the
stability scores are zero cause we have no cross-vals yet
between each generation, we run N-ways on the treatments on the hull
that is, the one under-the-hood IQ engine should be able to handle
defect detector generation (as we are currently planning) AND some
super-fast variant of treatment learning
note that under this scheme there is no need to tell treatment
learning about "nchanges" since we keep combining "N" till there is no
significant change in the hull volume. and we don't need "skew" or
"promising". i think we might even be able to do away with
"granularity" and the need for tar3's random search but that must be
the subject of another email.
does this sound right to you?
please send pseudo code for qhull
timm
p.s. you others: qhull is a multi-dimensional convex hull algorithm
(see example at
http://www.cs.sunysb.edu/~algorith/files/convex-hull.shtml). in
n-dimensional optimization there is one "sweet spot" we are trying to
achieve and x points around that spot. with standard qhull, we can
find the points nearest that sweet spot.
in kareem's idea, we take the points on the hull near the sweet spot
and combine them to form new points. we then assess the new points,
add them to the old points, and do a second generation qhull. now, the
hull contains the best combinations of generation one and generation
two. if the volume of the generation two hull > volume generation one
hull, then repeat.
it is not clear if this will work (not implemented yet). but public
doamin code for single generation qhull is all over the web and it
runs FAST in up to ten dimensions. heck, lets give it a go
it is also not clear is this is a heuristic or a complete search. to
prove that it is complete, we would have to show that any combination
of points on the hull is on the hull or better. i suspect that this
will only be true if dimensions have some restrictive properties
(something that promises monotonicity).
but even if its only heuristic, then still "wow". the cull offered by
qhull is amazing. in one study, kareem found that six points were on
the hull out of 300 possible defect detectors. which meant that 294
could be ignored