California COVID-19 By The Numbers
- Due May 10, 2024 by 11:59pm
- Points 5
- Available until May 17, 2024 at 11:59pm
We will look, in this activity, at whether or not the distribution of COVID-19 cases in California fits the demographics of the California population. This is one way to discover whether the diseases has a "disproportionate impact" on specific groups. The following image was created on July 30, 2020:
Data from California Department of Public Health h Links to an external site.Accessed July 31, 2020
Important Questions to Consider
Is it reasonable to treat this data set as a "sample" of a larger population? What would the (implied) population? What type of sample would it be?
The table gives census (population) data for all persons in California known to have been diagnosed with Covid-19 through July 29, 2020. Public health authorities have a complete list, from which they could take a random sample -- but we don't.
We could treat these data as a "sample" of the population of all persons in California with Covid-19 diagnoses from the start of the pandemic through today. That number was about 1 million as of 11/11/2020. So the "sample" is really too large relative to the "population" of all California Covid-19 cases.
Or we could treat the data as a "sample" of the population of all persons in the US with Covid-19 diagnoses from the start of the pandemic through today. That was about 10.5 million cases as of 11/11/2020. That would fulfill conditions for statistical analysis. However, it would be a "convenience" sample and most likely not typical of the population of all cases. Why?
Finally, you will see in this activity that when the sample size is very large, it's much more common to get results that are statistically significant.
Age and Covid-19
- How does the age distribution of Covid-19 cases in California compare with the population as a whole? We can use a chi-square Goodness of Fit hypothesis test with a 1% level of significance to answer that question. Remember to include all steps of any hypothesis test, including a clear statement of your conclusion. The null hypothesis is that the data fits the distribution.
The age distribution for California Links to an external site. (Accessed July 31, 2020). Note: 589 Covid patients whose age was not recorded (about 0.1% of total cases) are omitted.
Age group | CA population in millions | CA percent of total | Observed # of Covid-19 patients | Expected # out of 484,913 | |
---|---|---|---|---|---|
0 - 17 | 9.14 | 23.6% | 43,880 | ||
18 - 49 | 17.43 | 45.1% | 293,721 | ||
50-64 | 7.122 | 18.4% | 93,273 | ||
65 and over | 4.979 | 12.9% | 54,039 | ||
Total | 38.671 | 100% | 484,913 |
.
Gender and Covid-19
- Now we will look at gender. The data in the following table is taken from the same sources.
Gender | % of CA population | Observed # of cases |
Expected # of cases
|
||
---|---|---|---|---|---|
Female | 50.3% | 242,598 | |||
Male | 49.6% | 238,861 | |||
Unknown/Missing | 4,043 | ||||
Total | 100% | 485,502 |
Let's think for a minute about the categories here. Do you see any problems?
The "unknown/missing" category raises several questions. One is whether some of these people might have a gender identity other than F/M, but we have no way of knowing how the gender categorizations were made. The other problem is technical. The "Female" +"Male" percentages add up to 99.9%. Is this due to rounding errors, or might the remaining 0.1% belong in the "unknown/missing" category?
Before using a Chi-Square Goodness of Fit test, decide whether to include the "unknown/missing" category using the 0.1%, or whether to exclude those 4,043 people from the analysis. Then conduct the hypothesis, writing down all the steps and using a 1% level of significance. State your conclusion in a complete sentence.
Race/Ethnicity and Covid-19
- Now let's look at "Race and Ethnicity."
Race/Ethnicity | Number of cases | Percent of cases | Number of deaths | Percent of deaths | Percent of CA population |
---|---|---|---|---|---|
Latino/a/x | 181, 443 | 7.0 | 4,043 | 46.1 | 38.9 |
White | 55,238 | 17.4 | 2,633 | 30.0 | 36.6 |
Asian | 17,323 | 5.4 | 1,109 | 12.7 | 15.4 |
African-American/ Black | 13,622 | 4.3 | 734 | 8.4 | 6.0 |
Multi-race | 2,544 | 0.8 | 49 | 0.6 | 2.2 |
Indigenous North American OR Alaska Native | 747 | 0.2 | 33 | 0.4 | 0.5 |
Native Hawaiian or other Pacific Islander | 1,877 | 0.6 | 47 | 0.5 | 0.3 |
Other | 45,547 | 14.3 | 118 | 1.3 | 0.0 |
TOTAL | 318,341 | 100 | 8,766 | 100 | 100 |
What type of hypothesis test could help us determine whether or not the distribution of California COVID-19 cases fits the race/ethnicity distribution of the California population? What conditions would have to be met? Conduct the test, using a 1% level of significance, showing all steps and stating your conclusion in a complete sentence.
Repeat the analysis for the distribution of California Covid-19 deaths.
- What social or public-health policy implications do you think might follow from the analysis you have done in this activity? What further questions does it raise in your mind?
Post Comments, answers to questions, etc. in the California COVID-19 By The Numbers Discussion.