Hypothesis Testing
Notes I've taken on hypothesis testing, mostly comprising of information from courses on codecademy.com.
Types of Errors
- Type 1 error: false positive, rejecting a null hypothesis that is actually true in the population
- Type 2 error: false negative, failing to reject a null hypothesis that is actually false in the population
Null Hypothesis
- There's no association between the variables
- The p-value is the probability that we incorrectly reject the null hypothesis
- p-value < alpha means the variables are significantly different and the null hypothesis can be rejected in favor of the alternative hypothesis
Numerical Data
Samples should be:
- Normally distributed
- Have equal standard deviations (std ratio between 0.9 - 1.1)
- Independent
1 Sample T-Test
- Compares a sample mean to a hypothetical population mean
- Null hypothesis: a prediction that the observed sample comes from a population with the same mean
- The population mean equals the specified mean value
- Alternative hypothesis: a prediction that the observed sample comes from a population with a different mean
- The population mean is different from the specified mean value
2 Sample T-Test
- Compares two sets of numerical data
- Null hypothesis: the two observed samples come from populations of the same mean
- Alternative hypothesis: the two observed samples come from populations of different means
ANOVA (Analysis of Variance) Test
- Null hypothesis: all of the samples come from populations with the same mean
- Alternative hypothesis: at least one pair of populations (from which the samples were drawn) have different means; however, we cannot determine exactly which pair(s)
- We can’t make any conclusions on which two populations have a significant difference
Tukey's Range Test
- To determine which datasets are different
- From statsmodels, not scipy
- We have to provide the function with one list of all of the data and a list of labels that tell the function which elements of the list are from which set
Categorical Data
Binomial Test
- The binomial distribution describes the number of expected “successes” in an experiment with some number of “trials”
- Null hypothesis: the sample comes from a binomial distribution with the same
- Alternative hypothesis: the sample comes from a binomial distribution with a different
Chi Squared
- The probability that the outcome of two categorical variables are associated
- Columns are different conditions, rows are different outcomes
- Null hypothesis: there is no association between the variables
- Alternative hypothesis: there is a statistically significant association between the variables