What is a chi-square test?

A chi-square (χ²) test is a statistical test used to determine if there is a significant difference between observed and expected frequencies in one or more categories. The goodness of fit test asks: "Does my observed data fit the expected distribution?" For example, you might test whether a die is fair by rolling it 60 times and comparing observed face counts (O) to expected counts of 10 each (E). A large χ² means the observed data is unlikely under the assumed distribution.

How do you calculate the chi-square statistic?

χ² = Σ(Oᵢ − Eᵢ)² / Eᵢ, where Oᵢ is the observed frequency in category i and Eᵢ is the expected frequency. Sum this ratio across all categories. For example, if you observe [15, 20, 25] but expect [20, 20, 20]: χ² = (15−20)²/20 + (20−20)²/20 + (25−20)²/20 = 25/20 + 0 + 25/20 = 2.50. A larger χ² means larger discrepancies between observed and expected counts.

What is the p-value in a chi-square test?

The p-value is the probability of obtaining a chi-square statistic as large as (or larger than) the observed value, assuming the null hypothesis is true. A small p-value (typically p < 0.05) means the observed data is unlikely to have arisen by chance under the assumed distribution, leading you to reject the null hypothesis. For example, χ² = 7.82 with df = 2 gives p ≈ 0.020, which is statistically significant at α = 0.05 — the data does not fit the expected distribution.

What are degrees of freedom in a chi-square test?

For a goodness of fit test, degrees of freedom (df) = number of categories − 1. With 3 categories, df = 2; with 6 categories (like a die), df = 5. The "minus 1" reflects that once you know all but one category's count, the last one is determined (since all observed counts must sum to the total). The chi-square distribution changes shape with df — larger df requires a larger χ² to achieve the same p-value. Our calculator automatically computes df from the number of valid category rows.

When do you reject the null hypothesis?

Reject H₀ (the null hypothesis that the data fits the expected distribution) when the p-value is less than your chosen significance level (α). At α = 0.05: reject if p < 0.05. At α = 0.01: reject if p < 0.01. For example, if χ² = 11.07 with df = 3, the p-value ≈ 0.011. At α = 0.05, reject H₀ (result is significant). At α = 0.01, fail to reject H₀ (not significant at the stricter threshold). The calculator shows both decisions simultaneously.

What is a goodness of fit test?

A goodness of fit test assesses how well observed data matches a hypothesized theoretical distribution. Examples: (1) Testing whether a die is fair — expected frequency = total rolls / 6 per face; (2) Testing whether births are equally distributed across days of the week; (3) Testing whether the distribution of M&M colors matches the manufacturer's claimed percentages; (4) Testing whether a dataset is normally distributed. The test requires expected frequencies of at least 5 per category for the chi-square approximation to be valid.

When should I use a chi-square test?

Use a chi-square goodness of fit test when you have categorical data and want to compare observed counts to theoretically expected counts. Common scenarios: checking whether survey responses fit a uniform distribution, testing whether a genetic experiment matches expected Mendelian ratios (e.g., 3:1 dominant:recessive), and verifying that a manufacturing process produces defects at the claimed rate. The test requires counts (not percentages) and expected values of at least 5 per category.

What's the difference between chi-square goodness of fit and chi-square test of independence?

The goodness of fit test compares one set of observed frequencies to a theoretical expected distribution (one variable, multiple categories). The test of independence uses a two-way contingency table to test whether two categorical variables are related — for example, whether gender and political preference are independent. This calculator performs the goodness of fit test. For the independence test, you would need a contingency table chi-square calculator.

What does chi-square p < 0.05 mean?

A p-value below 0.05 means the observed data differs significantly from what the null hypothesis predicts at the 5% significance level. You reject the null hypothesis — the data does not fit the expected distribution. For example, if you test whether a die is fair and get p = 0.02, you conclude the die is likely biased. Remember that statistical significance does not tell you why the data differs, only that the difference is unlikely due to chance.

Chi Square Calculator — Chi-Square Test & P-Value

Calculates the chi-square statistic and p-value for a goodness of fit test — enter observed and expected frequencies per category.

How to Use the Chi-Square Calculator

This chi square calculator computes the χ² statistic and p-value for a goodness of fit test — enter your Observed (O) and Expected (E)frequencies for each category. Start with the 3 pre-filled rows and add more using the "+ Add category" button — you can have as many categories as needed. The calculator automatically computes the chi-square statistic (χ²), degrees of freedom, and p-value, and displays whether the result is statistically significant at both α = 0.05 and α = 0.01.

The column on the right shows each category's contribution to χ² — (O − E)² / E — so you can identify which categories drive the most discrepancy. Expected values must be positive (greater than 0); the chi-square approximation works best when all expected values are at least 5. For related probability calculations, see our binomial distribution calculator.

The Chi-Square Formula

The chi-square goodness of fit statistic is:

χ² = Σ(Oᵢ − Eᵢ)² / Eᵢ

Where Oᵢ is the observed count and Eᵢ is the expected count for each category i. The degrees of freedom for a goodness of fit test is:

df = k − 1

Where k is the number of categories.

Worked Example: Testing a Die for Fairness

A die is rolled 120 times. A fair die should show each face 120/6 = 20 times. Observed counts: {Face 1: 18, Face 2: 22, Face 3: 16, Face 4: 25, Face 5: 19, Face 6: 20}.

χ² = (18−20)²/20 + (22−20)²/20 + (16−20)²/20 + (25−20)²/20 + (19−20)²/20 + (20−20)²/20 = 0.2 + 0.2 + 0.8 + 1.25 + 0.05 + 0 = 2.50

With df = 5 and χ² = 2.50, the p-value ≈ 0.777. Since p > 0.05, we fail to reject H₀ — the die appears fair.

AdvertisementResponsive Ad

Interpreting the P-Value

The p-value tells you the probability of observing a chi-square statistic as large as yours (or larger) by chance, assuming the null hypothesis is true. The interpretation is straightforward:

p < 0.05 (significant): The observed data differs significantly from what the null hypothesis predicts. Reject H₀. There is evidence that the data does not fit the expected distribution.
p ≥ 0.05 (not significant): The data is consistent with the null hypothesis. Fail to reject H₀. The differences between observed and expected could plausibly be due to random chance alone.
p < 0.01 (highly significant): Very strong evidence against H₀. The result would occur less than 1% of the time by chance if H₀ were true.

Note that failing to reject H₀ does not prove it is true — it only means the data is insufficient to disprove it. The test is sensitive to sample size: with large enough n, even tiny deviations from expected become statistically significant.

Critical Values for Chi-Square

You can also compare χ² directly to a critical value from the chi-square distribution table for your chosen α and df:

df = 1: critical value = 3.841 (α = 0.05), 6.635 (α = 0.01)
df = 2: critical value = 5.991 (α = 0.05), 9.210 (α = 0.01)
df = 3: critical value = 7.815 (α = 0.05), 11.345 (α = 0.01)
df = 4: critical value = 9.488 (α = 0.05), 13.277 (α = 0.01)
df = 5: critical value = 11.070 (α = 0.05), 15.086 (α = 0.01)

If your calculated χ² exceeds the critical value for your df and α, reject H₀. Our calculator computes the exact p-value using a numerical approximation of the regularized incomplete gamma function, so you don't need to look up critical values manually.

AdvertisementResponsive Ad

Assumptions and Limitations

The chi-square goodness of fit test has several requirements:

Minimum expected frequency: Each expected count (E) should be at least 5. When expected frequencies are below 5, the chi-square approximation becomes unreliable and Fisher's exact test may be more appropriate.
Independent observations: Each observation must be independent. Repeated measures on the same subject violate this assumption.
Counts, not proportions: O and E must be frequencies (counts), not percentages. If you have percentages, multiply by the total n to get counts first.
Random sampling: Data should be obtained by random sampling from the population you are studying.

The chi-square test is one-directional: it only tests whether the data deviates significantly from expected values, not the direction of deviation. For questions about whether observed proportions match expected proportions from a theoretical model, the chi-square goodness of fit is the appropriate test. For comparing two group distributions or testing independence between two categorical variables, a chi-square test of independence would be used with a contingency table. Our normal distribution calculator can help you work with p-values and critical regions for other statistical tests.

Sources & References

Chi-Square Distribution — NIST/SEMATECH e-Handbook of Statistical Methods
Chi-Square Goodness of Fit Test — Khan Academy

Chi-Square Calculator

Chi-Square Goodness of Fit Calculator