Timm:: teaching
Detecting change in categorical data: mining contrast sets

new | hot | fun | blog

(by Ying Hu, Wed, May 29, 2002)


Reference

Reference
Problem Addressed
Approach
Details
Conclusion
Comments
Insights

Stephen D. Bay and Michael J. Pazzani, (1999), Detecting change in categorical data: mining contrast sets, Knowledge Discovery and Data Mining, pages 302-306.

@inproceedings{ bay99detecting,
    author = "Stephen D. Bay and
              Michael J. Pazzani",
    title = "Detecting Change in Categorical
             Data: Mining Contrast Sets",
    booktitle = "Knowledge Discovery and Data Mining",
    pages = "302-306",
    year = "1999",
    notes= "Available from
              \url{http://citeseer.nj.nec.com/bay99detecting.html
	      }"}


Problem Addressed

Reference
Problem Addressed
Approach
Details
Conclusion
Comments
Insights


Approach

Reference
Problem Addressed
Approach
Details
Conclusion
Comments
Insights

Our goal is to detect differences between contrasting groups automatically from data by seeking conjunctions of attrbutes and values that have different levels of support in different groups.

Mining Algorithm: STUCCO

  • exhaustive search for all attribute-value combinations (automatically from nchanges=1 to N)
  • for each level(nchanges), count support of every candidate, and determine if it is significant and large, if it should be pruned and if next level should be tried.
  • report a subset from all significant sets. Display the low level results first and then show only the higher lever that are surprising and significantly different.


Details

Reference
Problem Addressed
Approach
Details
Conclusion
Comments
Insights

Determine significance: whether the cset support is independent of group membership use contingency tables, chi-square test. controll type I error: different threshold for different level to determin significance.

Prune: prune a node on the search tree when all its specializations can never be a significant and large cset:

Finding surprising csets: when form the conjunction of N attribute-values, if the actual proportion of the conjunction is similar to the the proportion when those N attribute-values are independent, then the conjunction is not surprising.


Conclusion

Reference
Problem Addressed
Approach
Details
Conclusion
Comments
Insights
Using the approach(algorithm) described in the paper enables us to detect differences across contrasting groups, which can not be satisfactorily solved by other mining algorithm such as association mining.


Comments

Reference
Problem Addressed
Approach
Details
Conclusion
Comments
Insights

This paper defines a problem and describes an algorithm to solve it. It discusses technical details of the algorithm and evaluates it on 2 datasets. Results of the experiments are presented but no validation is given. In fact, there is no obvious method for validation.

The authors compare their work to others' related research:

The paper doesn't mention runtime; it only lightly mentions the discretization method: continuous attributes are discretized into equal sized intervals.

It's experimental output is given as a list with some technical figures. It is hard to tell which set is more important. The number of the sets is more than 100.


Insights

Reference
Problem Addressed
Approach
Details
Conclusion
Comments
Insights
Contrast sets are in fact what we call treatments. It's the only paper by now I've seen that aims at mining treatments.

There is no ordering in the classes. Each contrast set is supposed to represent differences between 2 classes. i.e there is no preference for any class, a contrast set only separate the classes. This is the primary difference between contrast sets and tar2's treatments.

The algorithm automatically finds contrast sets in all levels, and use prune methods to decide if a deeper level is necessary for a node. We can do this on tar2: automatically try nchanges from 1 to N, terminate the process under certain condition.

Prune methods we can use now is promising and skew, both need predifine. (use skew to terminate the combination process--algorithm optimization)

  See who's visiting this page. bite::src ©2003::legal 
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


keyword: [TImM'sPaGES]