Notes on Probability
This document summarizes key concepts in probability theory.
The concepts in this note are introduced in Week 4.
Set Concepts and Notation
- A set
is an unordered collection of distinct elements. is the empty set. is the union: all elements in either or (or both). is the intersection: all elements in both and . is the set difference: all elements in but not in .- If
is a subset of some larger set that contains all the possible elements under consideration, then the complement is the set of elements not in . is the cardinality (or size) of . It may be infinite. is the power set of : the set of all subsets of .
Kinds of Sets
There are, broadly speaking, three kinds of sets in terms of their cardinality:
- Finite sets have a finite number of elements. There is some natural number
such that . - Countable sets (or countably infinite sets) are infinite sets with the same cardinality as the set of natural numbers (
). Formally, there exists an isomorphism (a 1:1 onto mapping) between members of and . Natural numbers, integers, rationals, and algebraics (rationals and roots) are all countable sets. - Uncountable sets are infinite sets whose cardinality is larger than that of the natural numbers. The real numbers (
) and the power set of the natural numbers ( ) are two frequently-encountered uncountable sets.
We also talk about discrete and continuous sets:
- A continuous set
with an order is a set where we can always find an element to fit between any two other elements: for any such that , there is a such that . - A discrete set is a set that is not continuous: there are irreducible gaps between elements.
All finite sets are discrete. The natural numbers and integers are also discrete. The real numbers are continuous. Rationals and algebraics are also continuous, but we won't be using them directly in this class.
Note
While
Events
A random process (or a process modeled as random) produces distinct individual outcomes, called elementary events.
We use
Probability is defined over events.
An event
We use set operations to combine events:
is the event “both and happened”. is the event “either or (or both) happened”. is the event “ happened but not ”. If , then .
With these definitions, we can now define the event space:
.- If
, then its complement . We say is closed under complement.- Since
and , .
- Since
- If
, then their union . This applies also to unions of countably many sets. We say is closed under countable unions.
Here are some additional properties of sigma fields (these are listed separately from the previous properties because those are the definition of a sigma field and these are consequences — we can prove them from the definitions and axioms):
- If
,
Probability
Now that we have a sigma field, we can define the concept of probability.
A probability distribution (or measure)
— the probability of something happening is 1. — non-negativity: probabilities are not negative.- If
are (countably many) disjoint events in , then (countable additivity).
A collection of disjoint sets is also called mutually exclusive. What it means is that for any
We a field of events equipped with a probability measure
Some additional facts about probability:
(combined with non-negativity, we have )- If
, then
Joint and Conditional Probability
We define the joint probability
The conditional probability
Conditional and joint probabilities decompose as follows:
From this we can derive Bayes' theorem:
We can marginalize a joint distribution by summing. If
We call
Independence
Two events are independent if knowing the outcome of one tells you nothing about the probability of the other.
The following are true if and only if
Continuous Probability & Random Variables
If
Instead, we define a sigma field where events are intervals:
is the set of intervals, their complements, and their countable unions. It contains infinitesimally small intervals, but not singletons.
This is not the only way to define probabilities over continuous event spaces, but it is the common way of defining probabilities over real values.
This particular sigma-field is called the Borel sigma algebra, and we will denote it
We often talk about continuous distributions as the distribution of a random variable
We define continuous probabilities in terms of a distribution function
This is also called the continuous distribution function (CDF).
We can use it to compute the probability for any interval:
This probability is called the probability mass on a particular interval.
Distributions are often defined by a probability density function
Unlike probabilities or probability mass, densities can exceed 1.
When you use sns.distplot
and it shows the kernel density estimator (KDE), it is showing you an estimate of the density.
That is why the
We can also talk about joint and conditional continuous probabilities and densities. When marginalizing a continuous probability density, we replace the sum with an integral:
Note
Technically, a random variable for a probability space
Expectation
The expected value of a random variable
If
Note
If we use the technical definition of a random variable, then we denote:
We can also talk about the conditional expectation
Variance and Covariance
The variance of a random variable
The standard deviation is the square root of variance (
The covariance of two random variables is the expected value of the product of their deviations from mean:
The correlation
We can also show that
Random variables can also be described as independent in the same way as events: knowing one tells you nothing about the other.
If two random variables are independent then their covariance
Properties of Expected Values
Expected value obeys a number of useful properties (
- Linearity of expectation:
- If
and are independent, then - If
, then - If
, then
Expectation of Indicator Functions
Sets can be described as an indicator function (or characteristic function)
Then the expected value of this function is the same as the probability of
Odds
Another way of computing probability is to compute with odds: the ratio of probabilities for or against an event. This is given by:
The log odds are often computationally convenient, and are the basis of logistic regression:
The logit function converts probabilities to log-odds.
We can also compute an odds ratio of two outcomes:
Further Reading
If you want to dive more deeply into probability theory, Michael Betancourt's case studies are rather mathematically dense but quite good:
- Probability Theory (For Scientists and Engineers)
- Conditional Probability
- Product Placement (probability over product spaces)
For a book:
- Introduction to Probability by Grinstead and Snell
- An Introduction to Probability and Simulation - a hands-on online book using Python simulations