In [6]:
library(tidyverse)
Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
Conflicts with tidy packages ---------------------------------------------------
filter(): dplyr, stats
lag():    dplyr, stats
In [8]:
options(repr.plot.height=4.5, repr.matrix.max.rows=15)

Let's flip a coin:

In [3]:
flips = sample(c("Red", "Blue"), 100, replace=TRUE, prob=c(0.4, 0.6))
flips
  1. 'Blue'
  2. 'Blue'
  3. 'Blue'
  4. 'Blue'
  5. 'Blue'
  6. 'Blue'
  7. 'Red'
  8. 'Blue'
  9. 'Blue'
  10. 'Blue'
  11. 'Red'
  12. 'Blue'
  13. 'Blue'
  14. 'Blue'
  15. 'Red'
  16. 'Blue'
  17. 'Red'
  18. 'Blue'
  19. 'Blue'
  20. 'Red'
  21. 'Blue'
  22. 'Blue'
  23. 'Blue'
  24. 'Blue'
  25. 'Blue'
  26. 'Blue'
  27. 'Red'
  28. 'Blue'
  29. 'Blue'
  30. 'Blue'
  31. 'Blue'
  32. 'Blue'
  33. 'Blue'
  34. 'Red'
  35. 'Red'
  36. 'Red'
  37. 'Blue'
  38. 'Blue'
  39. 'Blue'
  40. 'Blue'
  41. 'Blue'
  42. 'Blue'
  43. 'Blue'
  44. 'Red'
  45. 'Red'
  46. 'Blue'
  47. 'Blue'
  48. 'Red'
  49. 'Red'
  50. 'Blue'
  51. 'Blue'
  52. 'Red'
  53. 'Blue'
  54. 'Blue'
  55. 'Red'
  56. 'Blue'
  57. 'Blue'
  58. 'Blue'
  59. 'Blue'
  60. 'Blue'
  61. 'Blue'
  62. 'Red'
  63. 'Blue'
  64. 'Red'
  65. 'Red'
  66. 'Blue'
  67. 'Red'
  68. 'Blue'
  69. 'Red'
  70. 'Red'
  71. 'Blue'
  72. 'Red'
  73. 'Blue'
  74. 'Red'
  75. 'Blue'
  76. 'Red'
  77. 'Red'
  78. 'Red'
  79. 'Red'
  80. 'Blue'
  81. 'Red'
  82. 'Blue'
  83. 'Blue'
  84. 'Blue'
  85. 'Red'
  86. 'Red'
  87. 'Blue'
  88. 'Blue'
  89. 'Blue'
  90. 'Blue'
  91. 'Blue'
  92. 'Blue'
  93. 'Blue'
  94. 'Blue'
  95. 'Blue'
  96. 'Red'
  97. 'Blue'
  98. 'Red'
  99. 'Blue'
  100. 'Red'

What's the % red?

In [4]:
mean(flips == "Red")
0.33

Let's do many iterations of our 100 flips to see the sampling distribution:

In [5]:
many_means = sapply(1:100, function(i) {
    mean(sample(c("Red", "Blue"), 100, replace=TRUE, prob=c(0.4, 0.6)) == "Red")
})

Plot it:

In [14]:
ggplot(data.frame(p=many_means)) +
    aes(x=p) +
    geom_histogram() + 
    geom_density(color="blue") +
    geom_vline(xintercept=0.4, color="red")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Hypothesis Testing

We believe the coin is unfair. Let's test:

  • $H_0$: coin is fair ($p = 0.5$)
  • $H_a$: coin is unfair ($p \ne 0.5$)
In [25]:
flips = sample(c("Red", "Blue"), 100, replace=TRUE, prob=c(0.4, 0.6))
In [26]:
prop.test(sum(flips == "Red"), length(flips), 0.5)
	1-sample proportions test with continuity correction

data:  sum(flips == "Red") out of length(flips), null probability 0.5
X-squared = 4.41, df = 1, p-value = 0.03573
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.2955674 0.4929886
sample estimates:
   p 
0.39 

Enriching our Modeling

In [ ]: