In AP Statistics, calculating statistics for two categorical variables is essential for understanding relationships and dependencies between them. This topic involves using contingency tables, marginal and joint distributions, conditional distributions, and the chi-square test of independence. These tools allow you to analyze the interaction between two categorical variables, identify patterns, and determine if a significant association exists. Mastering these techniques is crucial for effectively interpreting data and drawing meaningful conclusions in the AP Statistics exam.
Learning Objectives
By studying how to calculate statistics for two categorical variables, you will learn to use contingency tables, marginal and joint distributions, and conditional distributions. You will also master performing the chi-square test of independence. These skills will enable you to analyze interactions between categorical variables, identify patterns, and determine significant associations. This knowledge is crucial for effectively interpreting data and drawing meaningful conclusions, preparing you for success in the AP Statistics exam.
Contingency Tables
- Definition: A contingency table (or cross-tabulation) displays the frequency distribution of two categorical variables.
- Structure: Rows represent categories of one variable, and columns represent categories of another variable.
- Purpose: Helps to examine the relationship between two categorical variables.
Marginal Distribution
- Definition: The distribution of each categorical variable separately.
- Calculation: Sum the frequencies in each row and column of the contingency table.
- Example:
Math | Science | English | Total | |
Male | 10 | 20 | 15 | 45 |
Female | 20 | 15 | 20 | 55 |
Total | 30 | 35 | 35 | 100 |
Joint Distribution
- Definition: The distribution of both categorical variables together.
- Calculation: Divide each cell frequency by the total number of observations.
- Example:
Math | Science | English | Total | |
Male | 0.10 | 0.20 | 0.15 | 0.45 |
Female | 0.20 | 0.15 | 0.20 | 0.55 |
Total | 0.30 | 0.35 | 0.35 | 1.00 |
Conditional Distribution
- Definition: The distribution of one categorical variable for each level of the other categorical variable.
- Calculation: Divide the cell frequency by the row or column total.
- Example:
- For Males:
\[
\text{Male}
\]
\[
\text{Math } \dfrac{10}{45}
\]
\[
\text{Science } \dfrac{20}{45}
\]
\[
\text{English } \dfrac{15}{45}
\]
Chi-Square Test of Independence
- Definition: A statistical test to determine if there is a significant association between two categorical variables.
- Hypotheses:
- \[H_0\]: The two variables are independent.
- \[H_1\]: The two variables are not independent.
- Formula:
\[
\text{Expected Frequencies:}
\]
\[
E_{\text{Math, Male}} = \dfrac{45 \times 30}{100} = 13.5
\]
\[
E_{\text{Science, Male}} = \dfrac{45 \times 35}{100} = 15.75
\]
\[
E_{\text{English, Male}} = \dfrac{45 \times 35}{100} = 15.75
\]
\[
\text{Chi-Square Statistic:}
\]
\[
\chi^2 = \sum \dfrac{(O_i – E_i)^2}{E_i} = \dfrac{(10 – 13.5)^2}{13.5} + \dfrac{(20 – 15.75)^2}{15.75} + \dfrac{(15 – 15.75)^2}{15.75} + \cdots
\]
- Expected Frequency:
\[
E_i = \dfrac{(\text{row total} \times \text{column total})}{\text{grand total}}
\]
Examples
Example 1: Contingency Table
- Data: Survey of 100 students on gender (Male, Female) and favorite subject (Math, Science, English).
- Table:
Math | Science | English | Total | |
Male | 10 | 20 | 15 | 45 |
Female | 20 | 15 | 20 | 55 |
Total | 30 | 35 | 35 | 100 |
Example 2: Marginal Distribution
- Calculation: Sum the frequencies in each row and column of the contingency table.
- Result:
Math | Science | English | Total | |
Male | 10 | 20 | 15 | 45 |
Female | 20 | 15 | 20 | 55 |
Total | 30 | 35 | 35 | 100 |
Example 3: Joint Distribution
- Calculation: Divide each cell frequency by the total number of observations.
- Result:
Math | Science | English | Total | |
Male | 0.10 | 0.20 | 0.15 | 0.45 |
Female | 0.20 | 0.15 | 0.20 | 0.55 |
Total | 0.30 | 0.35 | 0.35 | 1.00 |
Example 4: Conditional Distribution
- Calculation: Divide the cell frequency by the row or column total.
- Result:
- For Males:
\[
\text{Male}
\]
\[
\text{Math } \dfrac{10}{45}
\]
\[
\text{Science } \dfrac{20}{45}
\]
\[
\text{English } \dfrac{15}{45}
\]
Example 5: Chi-Square Test of Independence
- Data: Use the contingency table from Example 1.
- Calculation:
- Expected Frequencies:
\[
E_{\text{Math, Male}} = \dfrac{45 \times 30}{100} = 13.5
\]
\[
E_{\text{Science, Male}} = \dfrac{45 \times 35}{100} = 15.75
\]
\[
E_{\text{English, Male}} = \dfrac{45 \times 35}{100} = 15.75
\]
- Chi-Square Statistic:
\[
\chi^2 = \sum \dfrac{(O_i – E_i)^2}{E_i} = \dfrac{(10 – 13.5)^2}{13.5} + \dfrac{(20 – 15.75)^2}{15.75} + \dfrac{(15 – 15.75)^2}{15.75} + \cdots
\]
- Calculate the sum to get the chi-square value.
Multiple Choice Questions
Question 1: What does a contingency table display?
A. The frequency distribution of a single categorical variable
B. The relationship between two categorical variables
C. The distribution of a quantitative variable
D. The correlation between two quantitative variables
Answer: B. The relationship between two categorical variables
Explanation: A contingency table displays the frequency distribution of two categorical variables, showing their relationship.
Question 2: What is the purpose of the chi-square test of independence?
A. To determine the mean of two categorical variables
B. To find the correlation between two quantitative variables
C. To test if there is a significant association between two categorical variables
D. To compare the variances of two samples
Answer: C. To test if there is a significant association between two categorical variables
Explanation: The chi-square test of independence determines if there is a significant association between two categorical variables.
Question 3: How do you calculate the expected frequency in a contingency table?
A. Divide the cell frequency by the total number of observations
B. Multiply the row total by the column total and divide by the grand total
C. Subtract the mean from the observed frequency
D. Sum the frequencies of all cells in the table
Answer: B. Multiply the row total by the column total and divide by the grand total
Explanation: The expected frequency in a contingency table is calculated by multiplying the row total by the column total and then dividing by the grand total.