Who Is Killed by Police? part II (Chi-squared Test of Independence)
- Due May 10, 2024 by 11:59pm
- Points 5
- Available until May 17, 2024 at 11:59pm
We will use the same data set of 8,263 deaths at the hands of police between January 2013 and June 2020.
Now we'll look at the relationship (if any) between age and race/ethnicity of the deceased. This is bivariate data (two variables) and, since race/ethnicity is a categorical (and therefore discrete) variable, we will treat age also as discrete categories: under 18, 18-44, 45 and older. That is, children; younger adults; older adults.
The youngest deaths in the data set were three one-year-olds. The oldest to die was over 100.
We can start with the same random sample of 100 that we used in part I. As expected, the range is considerably smaller: from 16 to 71 years old. We will omit the persons with unknown age and/or ethnicity, and for technical reasons we'll omit the two Asians in the sample.
Contingency Tables and Probability
The data can be arranged in a "contingency table" as follows:
Age range | White | Black | Hispanic | Row totals |
---|---|---|---|---|
< 21 | 2 | 5 | 1 | 4 |
21-44 | 33 | 25 | 12 | 74 |
45 and over | 12 | 0 | 0 | 12 |
Column totals | 47 | 30 | 13 | 90 |
In the study of probability, we defined two events A and B as "independent" if P(A) = P(A|B) or, alternatively, if P(B) = P(B|A).
Applying this criterion to this table, it's clear right away that age and race/ethnicity are not independent. For example, let event A = person is 45 or over and let event B = person is Black . Then P(A)= 12/90 or approximately 0.13. but P(A|B) = 0/30 = 0.
Inferences from Samples to Populations
The data in the table are SAMPLE data taken from the POPULATION enumerated in the data set. (The data set is also, in some sense, a very large sample -- we don't know how many cases were not identified.)
And when we ask whether age and race/ethnicity are independent, we actually want to know about the POPULATION, not the sample. We need to make an INFERENCE from the sample to the population.
This will be a type of HYPOTHESIS TEST called a "Chi-squared Test of Independence." (You might need to review this topic before proceeding.) The null hypothesis is always that the two variables are independent. The alternative hypothesis is that there is a relationship between the two variables.
- Use an automated calculator function (preferably) or hand calculations to conduct a Chi-Squared test of Independence using the data in the table. Use a 5% level of significance. What is the conclusion?
Now look at the "expected" values in the table. There is a problem here: the conditions for conducting the test include expected values of at least 5 in all cells. This condition is not fulfilled. (It's not even close.) The test is not valid.
One possible solution is to simplify the table so there are fewer cells. For example, we could combine "Black" and "Hispanic." Another likely solution is to increase the sample size to 200. Again, we use the random-number generator and the chronological Excel list to choose another 100 people. Again, we will exclude individuals for which age and/or race/ethnicity are unknown. We will continue to exclude the very small number of Asian, Pacific Islander and Native American people. The result is:
Age range | White | Black | Hispanic | Row totals |
---|---|---|---|---|
< 21 | 3 | 11 | 1 | 15 |
21-44 | 58 | 47 | 20 | 125 |
45 and over | 28 | 6 | 4 | 38 |
Column totals | 89 | 64 | 25 | 178 |
- Use an automated calculator function (preferably) or hand calculations to conduct a Chi-Squared test of Independence using the data in the table. CHECK TO SEE WHETHER THE CONDITIONS FOR EXPECTED VALUES ARE SATISFIED. If it is appropriate to do so, use a 5% level of significance and state your conclusion.
Unfortunately, one cell still has an unacceptably low expected value. Which is it?
Now let's combine the categories of "Black" and "Hispanic" to get:
Age range | White | Black or Hispanic | Row totals |
---|---|---|---|
< 21 | 3 | 12 | 15 |
21-44 | 58 | 67 | 125 |
45 and over | 28 | 10 | 38 |
Column totals | 89 | 89 | 178 |
- Use an automated calculator function (preferably) or hand calculations to conduct a Chi-Squared test of Independence using the data in the table. CHECK TO SEE WHETHER THE CONDITIONS FOR EXPECTED VALUES ARE SATISFIED. If it is appropriate to do so, use a 5% level of significance and state your conclusion.
- What are some follow-up questions you would like to ask about the data in this activity?
NOW participate in the DISCUSSION: Who is Killed by Police? Part II