library(tidyverse)

Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
Conflicts with tidy packages ---------------------------------------------------
filter(): dplyr, stats
lag():    dplyr, stats

options(repr.plot.height=4.5, repr.matrix.max.rows=15)

Let's flip a coin:

flips = sample(c("Red", "Blue"), 100, replace=TRUE, prob=c(0.4, 0.6))
flips

What's the % red?

mean(flips == "Red")

Let's do many iterations of our 100 flips to see the sampling distribution:

many_means = sapply(1:100, function(i) {
    mean(sample(c("Red", "Blue"), 100, replace=TRUE, prob=c(0.4, 0.6)) == "Red")
})

Plot it:

ggplot(data.frame(p=many_means)) +
    aes(x=p) +
    geom_histogram() + 
    geom_density(color="blue") +
    geom_vline(xintercept=0.4, color="red")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Hypothesis Testing¶

We believe the coin is unfair. Let's test:

$H_0$: coin is fair ($p = 0.5$)
$H_a$: coin is unfair ($p \ne 0.5$)

flips = sample(c("Red", "Blue"), 100, replace=TRUE, prob=c(0.4, 0.6))

prop.test(sum(flips == "Red"), length(flips), 0.5)

	1-sample proportions test with continuity correction

data:  sum(flips == "Red") out of length(flips), null probability 0.5
X-squared = 4.41, df = 1, p-value = 0.03573
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.2955674 0.4929886
sample estimates:
   p 
0.39

Hypothesis Testing¶

Enriching our Modeling¶