Calculating Statistics For 2 Categorical Variables

In AP Statistics, calculating statistics for two categorical variables is essential for understanding relationships and dependencies between them. This topic involves using contingency tables, marginal and joint distributions, conditional distributions, and the chi-square test of independence. These tools allow you to analyze the interaction between two categorical variables, identify patterns, and determine if a significant association exists. Mastering these techniques is crucial for effectively interpreting data and drawing meaningful conclusions in the AP Statistics exam.

Learning Objectives

By studying how to calculate statistics for two categorical variables, you will learn to use contingency tables, marginal and joint distributions, and conditional distributions. You will also master performing the chi-square test of independence. These skills will enable you to analyze interactions between categorical variables, identify patterns, and determine significant associations. This knowledge is crucial for effectively interpreting data and drawing meaningful conclusions, preparing you for success in the AP Statistics exam.

Contingency Tables

Contingency Tables (1)
  • Definition: A contingency table (or cross-tabulation) displays the frequency distribution of two categorical variables.
  • Structure: Rows represent categories of one variable, and columns represent categories of another variable.
  • Purpose: Helps to examine the relationship between two categorical variables.

Marginal Distribution

Marginal Distribution (1)
  • Definition: The distribution of each categorical variable separately.
  • Calculation: Sum the frequencies in each row and column of the contingency table.
  • Example:
MathScienceEnglishTotal
Male10201545
Female20152055
Total303535100

Joint Distribution

Joint Distribution (1)
  • Definition: The distribution of both categorical variables together.
  • Calculation: Divide each cell frequency by the total number of observations.
  • Example:
MathScienceEnglishTotal
Male0.100.200.150.45
Female0.20
0.15
0.200.55
Total0.300.350.351.00

Conditional Distribution

Conditional Distribution
  • Definition: The distribution of one categorical variable for each level of the other categorical variable.
  • Calculation: Divide the cell frequency by the row or column total.
  • Example:
    • For Males:

\[
\text{Male}
\]
\[
\text{Math } \dfrac{10}{45}
\]
\[
\text{Science } \dfrac{20}{45}
\]
\[
\text{English } \dfrac{15}{45}
\]

Chi-Square Test of Independence

Chi-Square-Test-of-Independence
  • Definition: A statistical test to determine if there is a significant association between two categorical variables.
  • Hypotheses:
    • \[H_0\]​: The two variables are independent.
    • \[H_1\]​: The two variables are not independent.
  • Formula:

\[
\text{Expected Frequencies:}
\]

\[
E_{\text{Math, Male}} = \dfrac{45 \times 30}{100} = 13.5
\]

\[
E_{\text{Science, Male}} = \dfrac{45 \times 35}{100} = 15.75
\]

\[
E_{\text{English, Male}} = \dfrac{45 \times 35}{100} = 15.75
\]

\[
\text{Chi-Square Statistic:}
\]

\[
\chi^2 = \sum \dfrac{(O_i – E_i)^2}{E_i} = \dfrac{(10 – 13.5)^2}{13.5} + \dfrac{(20 – 15.75)^2}{15.75} + \dfrac{(15 – 15.75)^2}{15.75} + \cdots
\]

  • Expected Frequency:

\[
E_i = \dfrac{(\text{row total} \times \text{column total})}{\text{grand total}}
\]

Examples

Example 1: Contingency Table

  • Data: Survey of 100 students on gender (Male, Female) and favorite subject (Math, Science, English).
  • Table:
MathScienceEnglishTotal
Male10201545
Female20152055
Total303535100

Example 2: Marginal Distribution

  • Calculation: Sum the frequencies in each row and column of the contingency table.
  • Result:
MathScienceEnglishTotal
Male10201545
Female20152055
Total303535100

Example 3: Joint Distribution

  • Calculation: Divide each cell frequency by the total number of observations.
  • Result:
MathScienceEnglishTotal
Male0.100.200.150.45
Female0.200.150.200.55
Total0.300.350.351.00

Example 4: Conditional Distribution

  • Calculation: Divide the cell frequency by the row or column total.
  • Result:
    • For Males:

\[
\text{Male}
\]
\[
\text{Math } \dfrac{10}{45}
\]
\[
\text{Science } \dfrac{20}{45}
\]
\[
\text{English } \dfrac{15}{45}
\]

Example 5: Chi-Square Test of Independence

  • Data: Use the contingency table from Example 1.
  • Calculation:
    • Expected Frequencies:

\[
E_{\text{Math, Male}} = \dfrac{45 \times 30}{100} = 13.5
\]
\[
E_{\text{Science, Male}} = \dfrac{45 \times 35}{100} = 15.75
\]
\[
E_{\text{English, Male}} = \dfrac{45 \times 35}{100} = 15.75
\]

  • Chi-Square Statistic:

\[
\chi^2 = \sum \dfrac{(O_i – E_i)^2}{E_i} = \dfrac{(10 – 13.5)^2}{13.5} + \dfrac{(20 – 15.75)^2}{15.75} + \dfrac{(15 – 15.75)^2}{15.75} + \cdots
\]

  • Calculate the sum to get the chi-square value.

Multiple Choice Questions

Question 1: What does a contingency table display?

A. The frequency distribution of a single categorical variable
B. The relationship between two categorical variables
C. The distribution of a quantitative variable
D. The correlation between two quantitative variables

Answer: B. The relationship between two categorical variables

Explanation: A contingency table displays the frequency distribution of two categorical variables, showing their relationship.

Question 2: What is the purpose of the chi-square test of independence?

A. To determine the mean of two categorical variables
B. To find the correlation between two quantitative variables
C. To test if there is a significant association between two categorical variables
D. To compare the variances of two samples

Answer: C. To test if there is a significant association between two categorical variables

Explanation: The chi-square test of independence determines if there is a significant association between two categorical variables.

Question 3: How do you calculate the expected frequency in a contingency table?

A. Divide the cell frequency by the total number of observations
B. Multiply the row total by the column total and divide by the grand total
C. Subtract the mean from the observed frequency
D. Sum the frequencies of all cells in the table

Answer: B. Multiply the row total by the column total and divide by the grand total

Explanation: The expected frequency in a contingency table is calculated by multiplying the row total by the column total and then dividing by the grand total.