…essentially, applied linear algebra & probability
Every ML algorithm — from a simple linear regression to a neural network — is built on the same mathematical bedrock. The models, languages, and tools change constantly; the foundations do not.
- Linear models & regularization
- Generalized linear models (logistic regression)
- Bayesian linear models
- Neural networks
- Decision trees, random forests, boosting
- Unsupervised: PCA & clustering
- Bias-variance trade-off
- Cross-validation & resampling
- Maximum likelihood (MLE)
- Bayesian inference (MAP)
(now)
Fitting
Learning Deep Dive
Learning
The course deliberately starts with applied ML (regression & classification hands-on) before building up the statistical theory. This lets you see the big picture first, then understand why everything works.
The course uses R because the focus is on statistics, not software engineering. R makes the statistical fundamentals front-and-center.
⚡ If you understand the statistics and linear algebra, you have the foundation to work in any programming language once you know the syntax. The language is a tool; the math is the skill.
We want to learn a function f(x) that maps inputs x to outputs y. Because we learn from noisy, finite data, the relationship is always approximate — there will always be error.
Denoted x (scalar), x (vector), or X (matrix). What we observe and feed into the model.
Denoted y or t. What we want to predict. Can be continuous, categorical, count, etc.
Model assumptions are related to probability distributions. Understanding the probability distributions behind popular model formulations allows us to:
Two frameworks for finding optimal parameters: Maximum Likelihood Estimation (MLE) — frequentist statistics, and Maximum a Posteriori (MAP) — Bayesian inference. Both will be covered in depth.
When we understand a model's assumptions, we can:
- Understand what controls the model behavior
- Identify the strengths and weaknesses of different methods
- Recognize when a method is NOT appropriate
- Interpret the model's learned parameters
We have labelled input-output pairs. The output supervises learning of the model parameters. Goal: learn a mapping from input to output.
e.g. Predicting house price from features, classifying emails as spam/not-spam.
No output labels. Goal: discover interesting patterns and structure in the input data without being told what to look for.
e.g. Clustering customers, PCA for dimensionality reduction.
Continuous response — any real number.
Exactly 2 possible classes.
3 or more possible classes.
Also exist — covered later.
Observe variables without a distinction between inputs and responses. Explore the inputs without regard for any label (or when no label exists).
In this course: K-means and hierarchical clustering for grouping observations; Principal Component Analysis (PCA) for dimensionality reduction and discovering latent structure.
Supervised and unsupervised are the two main paradigms, but others exist:
| Paradigm | Description |
|---|---|
| Semi-supervised | Uses a mix of labelled and unlabelled data |
| Self-supervised | Generates its own supervision signal from the data structure (e.g. predicting masked words) |
| Active learning | Model queries for labels on the most informative samples |
| Online learning | Model updates continuously as new data arrives |
| Reinforcement learning | Agent learns via trial and error, receiving reward/penalty signals |
Real ML projects follow this pipeline. Note that EDA appears at multiple points — understanding the data is an ongoing activity, not a one-time step.
Access
& Clean
& Preprocess
Training Data
Best Model
The ideal data format is a flat rectangular table (a data frame or tibble in R):
| Observation | Input 1 | Input 2 | Response 1 (continuous) | Response 2 (binary) |
|---|---|---|---|---|
| 1 | 5.2 | green | 43.1 | TRUE |
| 2 | 6.1 | green | 57.4 | FALSE |
| 3 | 2.0 | yellow | 18.9 | FALSE |
Each row = one observation/sample. Each column = one input feature or response variable. In real projects, data usually starts scattered across multiple sources and must be merged before reaching this tidy format.
Despite enormous hype, a very large fraction of "big data" projects fail (estimates around 80%), and only a small fraction of proof-of-concept projects ever make it into production.
Data science sits at the intersection of three skill areas. The "Danger Zone" occurs when you have hacking skills and domain knowledge but no math & statistics understanding — you can produce confident-looking but completely wrong results without noticing.
After building solid foundations, you will be able to confidently answer questions like:
The most important part of machine learning is understanding the statistics behind the model. Math may seem intimidating at first, but visual and intuitive explanations — like those found in good online resources — make it far more approachable than it might seem.
ML is being used by companies across every industry: Microsoft Azure, GE, Amazon Web Services (AWS), Google (TensorFlow), Airbnb, Coca-Cola, Netflix, the NFL, and countless others.
| Capability | Example |
|---|---|
| Find patterns automatically | Discover structure in data without being told what to look for |
| Model relationships | Link observed measurements/traits to outcomes (purchases, health, capacity) |
| Predict outcomes | How likely is a new customer to buy a cordless drill given similar customers' behavior? |
| Adapt to feedback | Improve performance relative to environment and user feedback (reinforcement learning) |
| Generate outputs | Create new content based on some class of input — chatbots, generative art, code |
Recent advances in large language models and generative AI have created enormous hype. Two important things to keep in mind:
The mathematical ideas behind ML are decades old. What changed recently:
- Ubiquitous digital data collection (surveillance, sensors, social media)
- Data centers and supercomputers capable of processing it all
- Proliferation of internet-connected devices (IoT)
- Growth of ML/data science university programs
- Open-source algorithms and frameworks (TensorFlow, scikit-learn, R, etc.)