Skip to main content

Hypothesis Testing

Notes I've taken on hypothesis testing, mostly comprising of information from courses on codecademy.com.

Types of Errors

  • Type 1 error: false positive, rejecting a null hypothesis that is actually true in the population
  • Type 2 error: false negative, failing to reject a null hypothesis that is actually false in the population

Null Hypothesis

  • There's no association between the variables
  • The p-value is the probability that we incorrectly reject the null hypothesis
  • p-value < alpha means the variables are significantly different and the null hypothesis can be rejected in favor of the alternative hypothesis

Numerical Data

Samples should be:

  1. Normally distributed
  2. Have equal standard deviations (std ratio between 0.9 - 1.1)
  3. Independent

1 Sample T-Test

  • Compares a sample mean to a hypothetical population mean
  • Null hypothesis: a prediction that the observed sample comes from a population with the same mean
    • The population mean equals the specified mean value
  • Alternative hypothesis: a prediction that the observed sample comes from a population with a different mean
    • The population mean is different from the specified mean value

2 Sample T-Test

  • Compares two sets of numerical data
  • Null hypothesis: the two observed samples come from populations of the same mean
  • Alternative hypothesis: the two observed samples come from populations of different means

ANOVA (Analysis of Variance) Test

  • Null hypothesis: all of the samples come from populations with the same mean
  • Alternative hypothesis: at least one pair of populations (from which the samples were drawn) have different means; however, we cannot determine exactly which pair(s)
  • We can’t make any conclusions on which two populations have a significant difference

Tukey's Range Test

  • To determine which datasets are different
  • From statsmodels, not scipy
  • We have to provide the function with one list of all of the data and a list of labels that tell the function which elements of the list are from which set

Categorical Data

Binomial Test

  • The binomial distribution describes the number of expected “successes” in an experiment with some number of “trials”
  • Null hypothesis: the sample comes from a binomial distribution with the same pp
  • Alternative hypothesis: the sample comes from a binomial distribution with a different pp

Chi Squared

  • The probability that the outcome of two categorical variables are associated
  • Columns are different conditions, rows are different outcomes
  • Null hypothesis: there is no association between the variables
  • Alternative hypothesis: there is a statistically significant association between the variables