|
Fall 2003 |
- Time:Th 5:00 - 7:30 PM
- Place: ESB 801
- Instructor: Dr. Tim Menzies (Ph.D.)
- Office: ESB 537
- Office Hours: Th 3-5 PM
- E-mail: tim@menzies.us
|
Objectives |
The founder of Lotus, Mitchell Kapor, once said that "getting
information off the Internet is like drinking from a fire hydrant".
His warning should be taken seriously. Unless we can process the
mountain of information that surrounds us, we must either ignore it or
be buried by it.
This subject introduces automatic data mining methods that find the
"pearls in the dust"; i.e. the stuff that really matters.
While many automatic tools exist to support
this process, these tools have to be used
appropriately. In fact, data miners have to
be used within a knowledge discovery process consisting:
-
Data cleaning: kill noise data and irrelevant data
-
Data integration:
combine sources to one source
-
Data selection:
throw away the stuff you donN"t want
-
Data transformation:
get the stuff ready for the leaner
-
Data mining:
learn
-
Pattern evaluation:
think about it
-
Knowledge representation:
show the results to the users.
So this subject has two goals:
-
Understanding the algorithms used in
data mining part
-
Understanding how to
write scripts to handle the rest
All the tools shown here can be downloaded for free and easily
installed on standard desktop machines. That is, everything you'll see
here can be used, by you, at your own computer.
|
Mailing list |
All communication with the class outside of lecture time will be
conducted via a Yahoo mailing list.
Students will be required to join
that group (to to http://groups.yahoo.com/group/dmwv03/)
|
Schedule |
Lectures end Nov 14
Subject will have two quizzes: week 7 (October 2)
and week 14 (Nov 20).
Project work with
deliverables every three to four weeks.
August 2003
Su Mo Tu We Th Fr Sa
1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23 --lecture1
24 25 26 27 28 29 30
31
September 2003
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 --proj1: M5' (marks=5)
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27 --proj2: J48, IR (marks=10)
28 29 30
October 2003
Su Mo Tu We Th Fr Sa
1 2 3 4 --quiz1: (marks=20)
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25 --proj3: TAR3 (marks=15)
26 27 28 29 30 31
November 2003
Su Mo Tu We Th Fr Sa
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15 --proj4: (marks=25)
16 17 18 19 20 21 22 --quiz2: (marks=25)
23 24 25 26 27 28 29 --thanksgiving
30
|
Marking |
Quiz : 40 marks
Projects: 60 marks
Project
late penalties:
1.5 marks per late day (weekend = 1 day). Late marks begin midnight on the
due date.
|
Textbook |
Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations
by
Ian H. Witten, Eibe Frank Morgan Kaufmann October 1999 416 pages Paper ISBN 1-55860-552-5
|
Resources |
- Gawk stuff:
- WEKA stuff:
- Other stuff:
- Other stuff:
|
Attendance Policy |
Attendance at lectures is required.
|
Academic Honesty |
Students are encouraged to discuss class topics between
themselves. However, each student or team should develop the
programming assignment, term paper and presentation,
individually. Copying an entire (or parts of a) research paper from a
copyrighted source and presenting it as if you wrote it yourself is
not an acceptable practice. It will result in an F grade for the
specific product. If you want to reuse, rewrite or interpret a
paragraph written by someone else, you must use quotes and identify
the source of the material.
|
Social Justice |
West Virginia University is committed to social justice. I concur with
that commitment and expect to foster a nurturing learning environment
based upon open communication, mutual respect, and
non-discrimination. Our University does not discriminate on the basis
of race, sex, age, disability, veteran status, religion, sexual
orientation, color or national origin. Any suggestions as to how to
further such a positive and open environment in this class will be
appreciated and given serious consideration. If you are a person with
a disability and anticipate needing any type of accommodation in order
to participate in this class, please advise me and make appropriate
arrangements with Disability Services (293-6700)."
|
Expected Workload |
This is a Masters level course. You MUST be prepared to dedicate 3 - 5 working hours a week to
this class (excluding the time spent in the classroom.
|
Projects |
Project1