library(tidyverse)
options(repr.plot.height=4.5, repr.matrix.max.rows=15)
Let's flip a coin:
flips = sample(c("Red", "Blue"), 100, replace=TRUE, prob=c(0.4, 0.6))
flips
What's the % red?
mean(flips == "Red")
Let's do many iterations of our 100 flips to see the sampling distribution:
many_means = sapply(1:100, function(i) {
mean(sample(c("Red", "Blue"), 100, replace=TRUE, prob=c(0.4, 0.6)) == "Red")
})
Plot it:
ggplot(data.frame(p=many_means)) +
aes(x=p) +
geom_histogram() +
geom_density(color="blue") +
geom_vline(xintercept=0.4, color="red")
We believe the coin is unfair. Let's test:
flips = sample(c("Red", "Blue"), 100, replace=TRUE, prob=c(0.4, 0.6))
prop.test(sum(flips == "Red"), length(flips), 0.5)