Skip to main content

Hypothesis Testing

Notes I've taken on hypothesis testing, mostly comprising of information from courses on codecademy.com.

Types of Errors

Type 1 error: false positive, rejecting a null hypothesis that is actually true in the population
Type 2 error: false negative, failing to reject a null hypothesis that is actually false in the population

Null Hypothesis

There's no association between the variables
The p-value is the probability that we incorrectly reject the null hypothesis
p-value < alpha means the variables are significantly different and the null hypothesis can be rejected in favor of the alternative hypothesis

Numerical Data

Samples should be:

Normally distributed
Have equal standard deviations (std ratio between 0.9 - 1.1)
Independent

1 Sample T-Test

Compares a sample mean to a hypothetical population mean
Null hypothesis: a prediction that the observed sample comes from a population with the same mean
- The population mean equals the specified mean value
Alternative hypothesis: a prediction that the observed sample comes from a population with a different mean
- The population mean is different from the specified mean value

2 Sample T-Test

Compares two sets of numerical data
Null hypothesis: the two observed samples come from populations of the same mean
Alternative hypothesis: the two observed samples come from populations of different means

ANOVA (Analysis of Variance) Test

Null hypothesis: all of the samples come from populations with the same mean
Alternative hypothesis: at least one pair of populations (from which the samples were drawn) have different means; however, we cannot determine exactly which pair(s)
We can’t make any conclusions on which two populations have a significant difference

Tukey's Range Test

To determine which datasets are different
From statsmodels, not scipy
We have to provide the function with one list of all of the data and a list of labels that tell the function which elements of the list are from which set

Categorical Data

Binomial Test

The binomial distribution describes the number of expected “successes” in an experiment with some number of “trials”
Null hypothesis: the sample comes from a binomial distribution with the same $p$
Alternative hypothesis: the sample comes from a binomial distribution with a different $p$

Chi Squared

The probability that the outcome of two categorical variables are associated
Columns are different conditions, rows are different outcomes
Null hypothesis: there is no association between the variables
Alternative hypothesis: there is a statistically significant association between the variables

Types of Errors
Null Hypothesis
Numerical Data
Categorical Data
- Binomial Test
- Chi Squared