INFSCI 1092: Data Analytics

Spring 2013: Syllabus


This is the first course in Data Analytics for undergraduate IS students. The objective of the course is to give students a broad overview of the various aspects of data analytics such as exploring, scrubbing, modeling, and interpreting data. Towards this, the course employs R as the primary tool with some use of Weka, Excel and Matlab. This being a first course, there is no rigorous development of data mining schemes, statistics, or theoretical treatment of aspects such as visualization, but the students are expected to do programming in R and develop a clear understanding of basic data modeling and ideas related to statistics.

Prerequisites (concepts):

None. An interest in analytical thinking and in data and a first course in Statistics will be helpful.

Contact Information:

Prashant Krishnamurthy
Office: DIST 718
Phone: 412-624-5144
E-mail: prashant AT mail DOT sis DOT pitt DOT edu
Course webpage: http://www2.sis.pitt.edu/~prashant/inf1092
Office hours: Wednesdays: 10:00 a.m. - 11.00 a.m. or by appointment
GSA: Xin Wang

Textbooks:

Wolfgang Jank, Business Analytics for Managers, Springer 2011
Lectures will be drawn from several sources (textbook, other books, articles, papers, and videos)

Required Software: R on Windows, Mac OS X, or Linux

References:

Data mining and Statistics

  • David J. Hand, Statistics, Sterling Press, 2008
  • G. Shmueli, N. R. Patel, and P.C. Bruce, Data Mining for Business Intelligence, John Wiley and Sons, 2010
  • J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2012
  • I.H. Witten, E. Frank, M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, 2011
  • R and Related Material
  • N. Matloff, The Art of R Programming, No Starch Press, 2011
  • A. F. Zuur, E. N. Ieno, E.H.W.G. Meesters, A Beginner's Guide to R, Springer, 2009
  • J. Albert and M. Rizzo, R by Example, Springer, 2012
  • R. M. Heiberger and E. Neuwirth, R through Excel, Springer, 2009
  • Visualization
  • Stephen Few, Now You See it: Simple Visualization Techniques for Quantitative Analysis, Analytics Press, 2009
  • D. Cook and D. F. Swayne, Interactive and Dynamic Graphics for Data Analysis with R and GGobi, Springer, 2007
  • Websites and Webinars
  • Revolutionary Analytics
  • Dataists
  • Kaggle Tutorials
  • Grading:

    Homework/In-Class Assignments 25%
    Midterm 20%
    Final 30%
    Project 25%

    Policies:

  • All work must be the student's own unless collaboration is explicitly permitted
  • Late assignments will not be accepted unless there are exceptional circumstances.
  • Homework is due ONE week after it is assigned unless otherwise mentioned.
  • Homework will be assigned every week unless otherwise mentioned.
  • Check for homework on the webpage even if it is not explicitly mentioned in class
  • Labs (where assigned) will be due TWO weeks after assignment
  • Students are responsible for doing the labs and submitting the reports to me
  • Check for lab instructions and changes on the webpage regularly
  • Keep checking the webpage for other changes regularly
  • All written work must be legible and clear to receive credit. Vagueness in your work leading to misinterpretation is not a valid reason for credit.
  • Course Outline:

    This schedule is only a guideline and is subject to change depending on the progression of the course.

  • Week 1: Introduction; Big Picture
  • Weeks 2-3: Start with basic tools, especially R, Types of Data, Data Representation, Some public data sets
  • Week 4: Data Exploration I
  • Week 5: Data Exploration II
  • Weeks 6: Data Collection and Storage
  • Week 7: Review and Midterm exam;
  • Week 8: Data Exploration III
  • Week 9: Models 1: Regression
  • Week 10: Models 2: Making Models Flexible
  • Week 11: Data Scrubbing
  • Week 12: Models III: Dangers of To Much Data
  • Week 13: Fine Tuning Models
  • Week 14: Final Exam