Home Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12

Decision Making in Sports

INFSCI 0530

General Information

Data and analytics have been part of the sports industry from as early as the 1870s, when the first boxscore in baseball was recorded. However, it is only recently that advanced data mining and machine learning techniques have been utilized for facilitating the operations of sports franchises. While part of the reason is related with the ability to collect more fine-grained data, an equally important factor for this turn to analytics is the huge success and competitive advantage that early adopters of investment in analytics enjoyed (popularized by the best-seller ``Moneyball'' that described the success that Oakland Athletics had with analytics). Draft selection, game-day decision making and player evaluation are just a few of the applications where sports analytics play a crucial role today. Apart from the sports clubs, other stakeholders in the industry (e.g., the leagues' offices, media, etc.) invest in analytics. The leagues increasingly rely on data to decide on potential rule changes. In this course, we will introduce data science concepts for sports analytics. Students will get introduced concepts related to data quality, data analysis and modeling as well as data visualization.

For whom? Students do not need to have programming skills. While some basic statistical background will be beneficial, the course will be self-sufficient. Students should also have an interest in sports, since all the analytical examples will be taken from the sports field.

Course Info

Course meetings: Tu/Thu, 1pm-2:15pm, IS 404

Instructor: Konstantinos Pelechrinis

Syllabus: (pdf)

Week 1

How can data inform decisions in sports (and other areas)? What biases might be present during a data-based decision process? In Week 1 we will discuss the ways data can facilitate decision making in sports. We will see what we need to be careful about to avoid biases in our decisions, while we will discuss data ethics and algorithmic auditing. Finally, we will see the CRISP-DM framework for planning data analysis projects. (ppt)
Readings:
  • Woodward, J. R. "Professional football scouts: An investigation of racial stacking." Sociology of Sport Journal 21.4 (2004): 356-375.
  • Mez, Jesse, et al. "Clinicopathological evaluation of chronic traumatic encephalopathy in players of American football." Jama 318.4 (2017): 360-370.
  • Kendall, Jason. "Designing a research project: randomised controlled trials and their principles." Emergency medicine journal: EMJ 20.2 (2003): 164. (or something else about causality-correlation)
  • Floridi, Luciano and Taddeo, Mariarosaria What is data ethics? Phil. Trans. R. Soc. A. 374:20160360.20160360 (2016)
  • Brown, Shea, Jovana Davidovic, and Ali Hasan. "The algorithm audit: Scoring the algorithms that score us." Big Data and Society 8.1 (2021): 2053951720983865
  • The Washington Post, "As biometrics boom, who owns athletes’ data? It depends on the sport"
  • Osborne, Barbara. "Legal and Ethical Implications of Athletes' Biometric Data Collection in Professional Sport." Marq. Sports L. Rev. 28 (2017): 37.

Week 2

This week we will introduce the notion of empirical probability and idea behind Bayes' rule. We will see how thinking in a Bayesian way is the first step towards assessing uncertainty and making an informed decision. (ppt)
Lecture data and analysis: (xlsx)
Readings:

Week 3

This week we will introduce the notion of random variables and variance as a measure of consistency and uncertainty. We will see how we can use these notions to start tackling decisions that need to be made during a game. (ppt)
Lecture data and analysis: (xlsx)
Readings:

Week 4

This week we will introduce the notion of hypothesis testing. We will see the process of randomization that is fundamental for this testing. We will also discuss the appropriate evaluation of decision making and the outcome bias. (ppt)
Lecture data and analysis: (xlsx)
Readings:

Week 5

This week we will see how we can quantify the linear correlation between two variables. We will extend this to multiple variables, and we will talk about regression problems and the process of learning the model parameters. We will also see an example of non-linear regression. (ppt)
Lecture data and analysis: (xlsx)
Using the Solver in Excel: (mov)

Week 6

This week we will see how we can identify correlations between a binary variable and continous variables through logistic regression. We will also see linear probability models, as well as, metrics for evaluating probabilistic predictions. (ppt)
Lecture data and analysis: (xlsx)
Readings:

Week 7

Midterm

Week 8

This week we will see the bias-variance tradeoff and the problem of overfitting. We will also introduce the notion of regularization for preventing overfitting. Finally, we will see the difference between descriptive and predictive models (ppt)
Lecture data and analysis: (xlsx)
Readings:

Week 9

This week we will take a look at Monte Carlo simulations as a way to estimate the probability of complex events. We will also see bootstrap, as a non-parametric way of estimating population parameters from a set of observations (ppt)
Lecture data and analysis: (xlsx)
Running Monte Carlo simulations in Excel: (mp4) (mp4)
Readings:

Week 10

This week we will take a look at various cognitive fallacies that humans experience when they try to assess probabilities and risk. We will also introduce the premise of natural experiments as a quasi-causal inference method. (ppt)
Lecture data and analysis: (xlsx)
Readings:

Week 11

This week we will see an introduction on game theory - and in particular, two-person, zero-sum games - and their application in sports. We will see the notion of Nash equilibrium and how it can be used to find the optimal pass-run mix in football, choose the side to kick a penalty shot, or decide how to defend the corner 3 in basketball. (ppt)
Lecture data and analysis: (xlsx)
Readings:

Week 12

This week we will see an introduction on networks and how they can be applied in sports. We will see the notion of centrality and importance and how we can estimate rank teams using Google's PageRank algorithm developed to rank webpages. (ppt)
Lecture data and analysis: (xlsx)

Every play is a data point!