There is no relationship between treatment and outcome, the difference is due to chance.
Alternative hypothesis:
There is a relationship, the difference is not due to chance.
Approach
Under the null hypothesis, treatment has NO impact on y (the outcome)
This means that if we were to change the values of the treatment variable, the values on ywould stay the same
Approach
So…we can simulate the null distribution by:
Reshuffling the treatment variable
Calculating the treatment effect
Repeating many times
Then we can ask: how likely would we be to observe the treatment effect in our data, if there is no effect of the treatment?
Résumé Experiment Example
Bertrand and Mullainathan studied racial discrimination in responses to job applications in Chicago and Boston. They sent 4,870 résumés, randomly assigning names associated with different racial groups.
Data is in openintro package as an object called resume
I will store as resume_data
Callbacks by Race
Remember, race of applicant is randomly assigned.
# A tibble: 2 × 2
race calls
<chr> <dbl>
1 black 0.0645
2 white 0.0965
Let’s save the means for white and black applicants.
And calculate the treatment effect. The treatment effect is the difference in means.
[1] 0.03203285
Before formal tests, let’s look at the data–the estimates and the confidence intervals…
First, let’s make the CIs for the white applicants.
Now, let’s create the CIs for black applicants.
Now, let’s tidy the data for plotting.
# A tibble: 2 × 4
race meanCalls lower95 upper95
<chr> <dbl> <dbl> <dbl>
1 Black 0.0645 0.0550 0.0743
2 White 0.0965 0.0850 0.108
Plot
Plot
ggplot(plot_data, aes(y = meanCalls, x = race, ymin = lower95, ymax = upper95)) +geom_col(fill ="steelblue4") +geom_errorbar(width = .05) +theme_bw() +ylim(0, .15) +labs(x ="Race of Applicant",y ="Call Back Rate")
Is this evidence of racial discrimination?
What is the null hypothesis?
What is the alternative hypothesis?
How can we formally test the null hypothesis to decide whether to reject it?
Formal Hypothesis Test
Calculate the difference in means (White - Black)
Shuffle the race variable
Calculate the difference in means for the shuffled data
Repeat many times
Simulates the null distribution of differences in callbacks
Hypothetical Original Data
Applicant
Race
Callback
A
Black
Yes
B
Black
No
C
Black
No
D
White
Yes
E
White
No
F
White
No
Step 1: Calculate Original Difference in Callback Rates
Objective: Understand initial association between race and callback rates