Probability & Statistics

Probability and statistics are the foundation of machine learning. ML models learn patterns from data, quantify uncertainty, and make probabilistic predictions.

Why it matters: Every ML prediction is probabilistic. Understanding distributions, variance, and statistical inference is essential for building robust models.

Core Concepts

Random Variable

A variable whose value is determined by chance.

X = outcome of dice roll
X ∈ {1, 2, 3, 4, 5, 6}

Discrete or continuous values

Probability Distribution

Describes how probabilities are distributed over values.

P(X = x) for discrete
f(x) for continuous

Sum/integral equals 1

Expected Value (Mean)

Average value weighted by probability.

E[X] = Σ x·P(x)
μ = mean

Center of distribution

Variance & Std Dev

Measure of spread around the mean.

Var(X) = E[(X - μ)²]
σ = √Var(X)

Quantifies uncertainty

Common Distributions

Normal (Gaussian)

Continuous

Bell curve. Most common in nature and ML.

N(μ, σ²) - defined by mean and variance

Used in: Gaussian processes, noise modeling, initialization

Bernoulli

Discrete

Binary outcome: success (1) or failure (0).

P(X = 1) = p, P(X = 0) = 1-p

Used in: Binary classification, coin flips

Uniform

Both

All outcomes equally likely.

P(X = x) = 1/n for all x

Used in: Random initialization, sampling

Bayes' Theorem

The foundation of probabilistic reasoning. Update beliefs based on evidence.

P(A|B) = P(B|A) · P(A) / P(B)
P(A|B): Posterior (updated belief)
P(B|A): Likelihood (evidence)
P(A): Prior (initial belief)
P(B): Marginal (normalization)
python
Output:
Click "Run Code" to see output

Statistical Concepts

Sampling
Draw subset from population to estimate properties
Hypothesis Testing
Test if observed effect is statistically significant
Confidence Intervals
Range likely to contain true parameter
P-value
Probability of observing data if null hypothesis is true
Correlation
Measure of linear relationship between variables
Regression
Model relationship between dependent and independent variables

AI Applications

Naive Bayes classifier uses Bayes' theorem
Gaussian distributions for weight initialization
Maximum Likelihood Estimation for parameter learning
Bayesian inference for uncertainty quantification
Statistical tests for model comparison
Probability distributions in generative models

Key Takeaway: ML is fundamentally probabilistic. Models learn probability distributions, make probabilistic predictions, and quantify uncertainty.