Incremental qHull

DATE: Sun, 30 Mar 2003 07:20:30 -0800 (PST)
FROM: "Tim Menzies" <tim@menzies.us>
SUBJECT: something sooooo obvious (in retrospect)
TO: "kareem" <kammar@csee.wvu.edu>
CC: <justin@lostportal.net>, <huying_ca@yahoo.com>, <drobo75@hotmail.com>, <lisa@nard.net>, <echiang@interchange.ubc.ca>

kareem,

your as yet unfinished IQ algorithm is the core of treatment learning.

no, that's not strong enough. IQ subsumes treatment learning. treatment learning is just some parameter setting to IQ

see, we score each treatment by its "worth" (change in class distribution over the baseline), its "support", its "stability" under N-way cross-val, and some user "utility" score on the attributes (e.g. i'd prefer to form treatments from these attributes, not those, since they are cheaper to control)

at gen i (i>0), we seek combinations of treatments that take us to higher worth, higher stability, higher support, higher utility

at gen0, we intialize the treatments via confidence1 and all the stability scores are zero cause we have no cross-vals yet

between each generation, we run N-ways on the treatments on the hull

that is, the one under-the-hood IQ engine should be able to handle defect detector generation (as we are currently planning) AND some super-fast variant of treatment learning

note that under this scheme there is no need to tell treatment learning about "nchanges" since we keep combining "N" till there is no significant change in the hull volume. and we don't need "skew" or "promising". i think we might even be able to do away with "granularity" and the need for tar3's random search but that must be the subject of another email.

does this sound right to you?

please send pseudo code for qhull

timm

p.s. you others: qhull is a multi-dimensional convex hull algorithm (see example at http://www.cs.sunysb.edu/~algorith/files/convex-hull.shtml). in n-dimensional optimization there is one "sweet spot" we are trying to achieve and x points around that spot. with standard qhull, we can find the points nearest that sweet spot.

in kareem's idea, we take the points on the hull near the sweet spot and combine them to form new points. we then assess the new points, add them to the old points, and do a second generation qhull. now, the hull contains the best combinations of generation one and generation two. if the volume of the generation two hull > volume generation one hull, then repeat.

it is not clear if this will work (not implemented yet). but public doamin code for single generation qhull is all over the web and it runs FAST in up to ten dimensions. heck, lets give it a go

it is also not clear is this is a heuristic or a complete search. to prove that it is complete, we would have to show that any combination of points on the hull is on the hull or better. i suspect that this will only be true if dimensions have some restrictive properties (something that promises monotonicity).

but even if its only heuristic, then still "wow". the cull offered by qhull is amazing. in one study, kareem found that six points were on the hull out of 300 possible defect detectors. which meant that 294 could be ignored

bite::src

keyword: [TImM'sPaGES]