A t-test is a statistical hypothesis test used to determine whether there is a significant difference between the means of one or two groups. It was developed by William Sealy Gosset, published under the pseudonym "Student" in 1908. T-tests are used when the population standard deviation is unknown and the sample size is small to moderate — they rely on the t-distribution rather than the normal distribution.

When should you use a t-test vs a z-test?

Use a t-test when the population standard deviation is unknown and you are estimating it from sample data. Use a z-test when the population standard deviation is known or the sample size is large (n > 30). In practice, t-tests are almost always used because population standard deviations are rarely known. For very large samples, the t-distribution and normal distribution converge, making the choice irrelevant.

What does the p-value mean in a t-test?

The p-value is the probability of observing a t-statistic at least as extreme as the one calculated, assuming the null hypothesis is true. A p-value of 0.03 means there is a 3% chance of seeing results this extreme by random chance if H₀ is true. The conventional threshold is α = 0.05: if p < 0.05, you reject the null hypothesis and call the result statistically significant. If p ≥ 0.05, you fail to reject H₀.

What is the difference between a one-tailed and two-tailed t-test?

A two-tailed test checks whether the means differ in either direction (H₁: μ ≠ μ₀). A one-tailed test checks for a difference in one specific direction — either greater (right-tailed: H₁: μ > μ₀) or less (left-tailed: H₁: μ < μ₀). Two-tailed tests are more conservative and more commonly used because researchers rarely know which direction a difference will occur. One-tailed tests have more statistical power but require strong prior justification.

What are degrees of freedom in a t-test?

Degrees of freedom (df) represent the number of independent pieces of information in the data. For a one-sample t-test, df = n − 1. For a two-sample Welch t-test, df is calculated using the Welch–Satterthwaite equation, which accounts for unequal variances between groups. Higher df moves the t-distribution closer to the normal distribution, making critical values smaller and p-values easier to achieve.

How do you interpret t-test results?

First check the p-value against your significance level (α, typically 0.05). If p < α, reject H₀ — the difference is statistically significant. Also examine the t-statistic: larger absolute values indicate larger differences relative to variability. Report the t-statistic, df, and p-value (e.g., t(28) = 2.45, p = 0.021). Statistical significance does not imply practical significance — always consider the effect size and context.

What are common real-world uses of t-tests?

T-tests are used across medicine, education, psychology, and business. In clinical trials, a one-sample t-test might compare a new drug's average effect to an established benchmark. A two-sample t-test is used in A/B testing to compare conversion rates between two versions of a webpage or email campaign. In education research, t-tests compare test score improvements between a control class and an experimental teaching method. In manufacturing quality control, they detect whether a production line's mean output has shifted from the target specification.

What is Cohen's d and why does it matter alongside the p-value?

Cohen's d is the standardized effect size for t-tests: d = (mean difference) / (pooled SD). It measures how large the effect is, independent of sample size. Conventions: d = 0.2 (small), d = 0.5 (medium), d = 0.8 (large). A study with n = 10,000 might produce p < 0.001 for d = 0.05 — a highly significant but practically trivial effect. Conversely, a small study with d = 0.8 might fail to reach significance (p = 0.10) due to low statistical power. Always report both p and effect size to give a complete picture.

T Test Calculator — One-Sample & Two-Sample T-Test

Calculates the t-statistic, degrees of freedom, and p-value for one-sample and two-sample hypothesis tests.

How to Use the T-Test Calculator

This t test calculator supports both a one-sample t-test (comparing a sample mean to a known value) and a two-sample t-test (comparing two independent group means). Select the tail type — two-tailed is appropriate for most research questions. Enter your sample statistics and the calculator instantly returns the t-statistic, degrees of freedom, p-value, and a decision at α = 0.05.

For survey-based research, pair this tool with our margin of error calculator to understand the precision of your estimates before running a formal hypothesis test.

T-Test Formulas

The t-test converts a difference in means into a standardized score relative to sampling variability.

One-Sample T-Test

When comparing a sample mean (x̄) to a known population mean (μ₀):

t = (x̄ − μ₀) / (s / √n)

Where s is the sample standard deviation and n is the sample size. Degrees of freedom: df = n − 1.

Two-Sample T-Test (Welch)

For comparing two independent group means:

t = (x̄₁ − x̄₂) / √(s₁²/n₁ + s₂²/n₂)

Degrees of freedom use the Welch–Satterthwaite equation, which adjusts for unequal group variances. This calculator uses Welch's t-test by default because it performs well whether or not the variances are equal.

AdvertisementResponsive Ad

Understanding P-Values and Statistical Significance

The p-value answers: "If the null hypothesis were true, how often would we see a result this extreme by chance?" A p-value of 0.05 means a 5% chance. The conventional cutoff is α = 0.05, but α = 0.01 is used in stricter contexts (medical trials, for example) and α = 0.10 in exploratory research.

Important caveats: a significant p-value does not mean the effect is large or practically meaningful. Always report the effect size (Cohen's d for t-tests) alongside the p-value. Also, p-values are sensitive to sample size — very large samples can produce significant p-values for trivially small differences.

One-Tailed vs Two-Tailed Tests

Choose two-tailed when you are testing for any difference between means (most common). Choose right-tailed when you predict the sample mean is greater than the reference value, or left-tailed when you predict it is smaller. One-tailed tests have more power but require strong prior justification — using them to "fish" for significance is considered p-hacking.

Step-by-Step Example: One-Sample T-Test

A researcher believes the average resting heart rate in a population is 70 bpm. She measures a sample of 25 people and finds x̄ = 73.2 bpm, s = 8.6 bpm.

State H₀: μ = 70; H₁: μ ≠ 70 (two-tailed)
Calculate t: t = (73.2 − 70) / (8.6 / √25) = 3.2 / 1.72 = 1.860
Degrees of freedom: df = 25 − 1 = 24
Find p-value: for t = 1.860, df = 24, two-tailed → p ≈ 0.075
Decision: p = 0.075 ≥ 0.05 → fail to reject H₀. The evidence is not strong enough to conclude the mean differs from 70 bpm at the 5% significance level.

For more complex inference, our confidence interval calculator can express the same result as an interval estimate.

AdvertisementResponsive Ad

Common Mistakes in T-Tests

Confusing one-tailed and two-tailed — always pre-specify the tail type before collecting data
Misinterpreting p-value as effect size — a p-value of 0.001 does not mean a large effect, only a highly unlikely result under H₀
Using t-tests on non-independent samples — for paired data (before/after measurements on the same subjects), use a paired t-test, not an independent two-sample test
Ignoring assumptions — t-tests assume approximately normal data; for small n with heavily skewed data, consider a non-parametric alternative like the Mann-Whitney U test
Multiple comparisons without correction — running many t-tests increases the false positive rate; apply Bonferroni or similar corrections
Wrong test for categorical data — t-tests compare means of continuous variables; for testing whether categorical counts differ from expected frequencies, use our chi-square calculator instead

Sources & References

Student's t-distribution — Wikipedia
Welch's t-test — Wikipedia
Hypothesis Testing — Khan Academy

T-Test Calculator