Data Mining

Special Topics: Math & Formal Foundations

Fall 2002 (03-1)

Instructor: Dr. Stephen Hirtle
Office: B201 IS Building
Office Phone: 624-9434
Office Hours: Tues 2:00-4:00pm or by appointment
Class Meets: Monday, 3:00-5:50pm, 403 IS Bldg
Prerequisite: IS 2020 or permission of instructor

All available information about this course may be found at:

 Overview. Data mining has become an increasingly popular statistical method for knowledge discovery.  With roots in classical scaling techniques and artificial intelligence, data mining provides a set of tools for the discovery and visualization of patterns in data warehouses. Data mining has been used in diverse applications from the discovery of quasars to the analysis of web logs. This course will look at various data mining tools to see how they work and what kinds of knowledge can be discovered. In addition, we will try to separate out the hype which exists in some popular accounts of data mining from the reality of the methods.

The course will provide an important foundation for further study in diverse areas, such as information retrieval, cognitive science, and marketing. The techniques discussed are also the foundation of many modern data mining techniques. The course will count as one of the two required foundations courses in the MSIS program or as one of the two required statistics course for the Information Science Track of the DIST PhD program.

Please note that a certain degree of mathematical and statistical fluency is required. It will be assumed that students have completed IS 2020, or the equivalent. If you have not taken this course, then please contact the instructor, via email, for permission to take the course.

Materials. The primary text for the term is

J. Han and M. Kamber.
Data Mining: Concepts and Techniques.
Morgan Kaufmann, 2000.

The following software package from the University of Pittsburgh Software Licensing Services is also required:

Clementine 6.5, the SPSS data mining package.

A complete set of powerpoint slides to accompany the text can be found at: There is a separate list of the weekly reading assignments. Data analysis may also require accounts on sunfire.sis, unixs.cis, and vms.cis. See me if you have trouble getting an account on any of these machines. We will be using several packages and programs, including S-plus and Clementine/SPSS during the semester. Additional links related to the class can be found at the following sites:

Evaluation. Evaluation will occur through a combination of three short papers and a term project. The papers can be one of two types: A review or an analysis.  A review will describe a current problem in the data mining field and dicuss various proposed solutions, including any solutions that you might suggest.  An analytical paper will consist of a short, written analysis of a data set using one or more techniques. The papers will be will be limited to 5 pages of text, plus supporting graphs and tables. For each paper, there will be a set of guidelines/topics that will be distributed two weeks before the paper is due. Each paper will count 50 points. Late papers will lose 2 points each day it is late. No paper will be accepted more than 7 days after the due date. All papers must be completed independently.  There must be at least one paper of each type at some point in the term.

In addition to the short papers, there will be a separate term project, which will be similar to the papers in style, but will include a general discussion and must cover at least two of the topic areas. The written part of the project will be limited to 8 pages of text. The project will also be presented orally during the last class meeting. The entire project will be worth 100 points, including 10 points for the oral presentation. Any extenuating circumstances that would result in missing the final deadline must be discussed in advanced with the instructor. The oral presentation cannot be made up.

Special circumstances.  If you have a disability for which you are or may be requesting an accomodation, you are encouraged to contact both your instructor and the Office of Disability Resources and Sevices, 216 William Pitt Union, (412-648-7890/TTY:412-383-7355) as early as possible in the term. DRS will verify your disability and determine reasonable accomodations for this course.  In addition, you should be aware that my office is up a short flight of stairs.  If this problematic, I am happy to arrange a meeting in an accessible location at any time.

Other References | CSNA | KD Nuggets | Citeseer