This is the first course in Data Analytics for undergraduate IS students. The objective of the course is to give students a broad overview of the various aspects of data analytics such as exploring, scrubbing, modeling, and interpreting data. Towards this, the course employs R as the primary tool with some use of Weka, Excel and Matlab. This being a first course, there is no rigorous development of data mining schemes, statistics, or theoretical treatment of aspects such as visualization, but the students are expected to do programming in R and develop a clear understanding of basic data modeling and ideas related to statistics.
Prerequisites (concepts):
None. An interest in analytical thinking and in data and a first course in Statistics will be helpful.
Contact Information:
Prashant Krishnamurthy
Office: DIST 718
Phone: 412-624-5144
E-mail: prashant AT mail DOT sis DOT pitt DOT edu
Course webpage: http://www2.sis.pitt.edu/~prashant/inf1092
Office hours: Wednesdays: 10:00 a.m. - 11.00 a.m. or by appointment
GSA: Xin Wang
Textbooks:
Wolfgang Jank, Business Analytics for Managers, Springer 2011
Lectures will be drawn from several sources (textbook, other books, articles, papers, and videos)
Required Software: R on Windows, Mac OS X, or Linux
References:
Data mining and Statistics
David J. Hand, Statistics, Sterling Press, 2008
G. Shmueli, N. R. Patel, and P.C. Bruce, Data Mining for Business Intelligence, John Wiley and Sons, 2010
J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2012
I.H. Witten, E. Frank, M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, 2011
R and Related Material
N. Matloff, The Art of R Programming, No Starch Press, 2011
A. F. Zuur, E. N. Ieno, E.H.W.G. Meesters, A Beginner's Guide to R, Springer, 2009
J. Albert and M. Rizzo, R by Example, Springer, 2012
R. M. Heiberger and E. Neuwirth, R through Excel, Springer, 2009
Visualization
Stephen Few, Now You See it: Simple Visualization Techniques for Quantitative Analysis, Analytics Press, 2009
D. Cook and D. F. Swayne, Interactive and Dynamic Graphics for Data Analysis with R and GGobi, Springer, 2007
Websites and Webinars
Revolutionary Analytics
Dataists
Kaggle Tutorials
Grading:
Homework/In-Class Assignments 25%
Midterm 20%
Final 30%
Project 25%
Policies:
All work must be the student's own unless collaboration is explicitly
permitted
Late assignments will not be accepted unless there are exceptional
circumstances.
Homework is due ONE week after it is assigned unless otherwise mentioned.
Homework will be assigned every week unless otherwise mentioned.
Check for homework on the webpage even if it is not explicitly mentioned
in class
Labs (where assigned) will be due TWO weeks after assignment
Students are responsible for doing the labs and submitting the reports
to me
Check for lab instructions and changes on the webpage regularly
Keep checking the webpage for other changes regularly
All written work must be legible and clear to receive credit. Vagueness
in your work leading to misinterpretation is not a valid reason for credit.
Course Outline:
This schedule is only a guideline and is subject to change depending on the progression of the course.
Week 1: Introduction; Big Picture
Weeks 2-3: Start with basic tools, especially R, Types of Data, Data Representation, Some public data sets
Week 4: Data Exploration I
Week 5: Data Exploration II
Weeks 6: Data Collection and Storage
Week 7: Review and Midterm exam;
Week 8: Data Exploration III
Week 9: Models 1: Regression
Week 10: Models 2: Making Models Flexible
Week 11: Data Scrubbing
Week 12: Models III: Dangers of To Much Data
Week 13: Fine Tuning Models
Week 14: Final Exam