INFSCI 2725: Data Analytics


Instructor: Marek J. Druzdzel

Overview

"Big Data underscores the real need for a scalable and reliable analytics infrastructure to support data analysis and modeling." --- Source: Business Wire 2012-09-06 19:01:00
"If you aren't taking advantage of big data, then you don't have big data, you have just a pile of data." --- Jay Parikh, VP of infrastructure at Facebook

21st century has been called by many "The Century of Data." We see more and more data collected with the expectation (justified by empirical evidence!) that analyzing these data will give organizations a competitive edge and will help them to excel. The amount of data collected is enormous and growing. In many cases, analyzing these data lags behind. Mark Twain wrote once "A man who does not read has no advantage over a man who cannot read." A similar sentence is most certainly true: "A man who does not analyze his data has no advantage over a man who has no data." Methods for analyzing data, called collectively "data analytics" are, therefore, crucial in this business.

INFSCI 2725 is an introductory course in the area of the so-called "Big Data," aiming at graduate students in Information Science and related disciplines. It focuses on essential technologies that are underlying collection, storage, and processing of data. The biggest and most important of these is analyzing data (many of the techniques are covered in more depth in Data Mining), even though we will spend some time on the topic of data collection and storage (covered in detail in the Advanced Topics in Database Management course).

The skills that you will need as a data scientist span over a large number of areas, such as statistics, databases, systems, programming, machine learning, artificial intelligence, business intelligence, and visualization. The knowledge that you need to acquire as a data scientist is not easy to gain by following courses in each of these areas, as concepts at their intersection are often obscured. The goal of this course is to simplify and present the most relevant material that you would otherwise have to learn in traditional disciplines and to point out the commonalities between these disciplined. The course is not designed to teach you the formal details of statistical procedures used in data analysis or to make you an expert practitioner of the specific analysis tools. That you can and should get in other courses, typically in Statistics, Artificial Intelligence and Data Mining. The point of this class is to develop broad critical abilities to approach collection, storage, and analysis of very large data sets. The course aims at improving your ability to think about data and information and to choose ways of extracting information and knowledge from data.

This is a fairly new course and I am still looking for the best way of synthesizing and presenting the material. The fact that the course is new is not the only difficulty here. An additional problem that we have to cope with is that the material itself is very new and changing rapidly. We are pretty much working at a frontier and no textbooks exist that cover all the material. Because of these reasons, the schedule may change slightly as we go. The set of readings may change to some degree as well. I have planned assignments, a project that will allow you to gets hands-on experience in data analytics, and two examinations that will test your general mastery of the material.

As you might have already experienced, being a graduate student requires intelligence, independent, creative thinking, and most of all commitment to hard working. This course reinforces this. There may be a higher than usual amount of readings. I have selected them in such a way that they will be fun to read and I expect that you will do them all with pleasure. The assignments and the project will offer you hand-on experience, something that will make you appreciate the size of data sets that we have to work with in the 21st century. The workload in this class will be moderately heavy, but I believe that you will find it interesting and important. I require your commitment, doing the readings, coming to classes, and being their active participant. In return, I promise that you will have fun and you will learn many useful skills.

Syllabus (Fall 2014, PDF)
Marek Druzdzel's teaching page
Marek Druzdzel's home page


HOME marek@sis.pitt.edu / Last update: 27 August 2014