How to Use the Linear Regression Calculator
This linear regression calculator takes comma-separated x and y values — both lists must have the same number of entries. The calculator computes the least-squares regression line, slope, y-intercept, Pearson r, and R² instantly. Optionally enter a new x value to get a predicted y from the fitted equation.
For example, to analyze how study hours (x) predict exam scores (y), enter 1, 2, 3, 4, 5 as x and 55, 65, 70, 80, 90 as y. See our standard deviation calculator to first understand the spread of each variable before fitting a regression.
The Least Squares Regression Formulas
Simple linear regression finds the line y = mx + b that minimizes the sum of squared residuals (SSres). The formulas are:
- Slope: m = [n·Σxy − Σx·Σy] / [n·Σx² − (Σx)²]
- Intercept: b = (Σy − m·Σx) / n
- R²: 1 − SSres / SStot
- Pearson r: sign(m) × √R²
Where n is the number of data points, Σxy is the sum of each x·y product, and SStot is the total sum of squared deviations from the mean of y.
Interpreting R²
R² = 0.00 means the regression line explains nothing; R² = 1.00 means it explains everything perfectly. Common benchmarks: R² < 0.30 is weak, 0.30–0.70 is moderate, and > 0.70 is strong — though these thresholds depend heavily on the field. Social science data often has R² around 0.20–0.40, while engineering data may routinely exceed 0.95.
Pearson r vs R²
For simple linear regression, R² equals r² (Pearson correlation squared). The correlation r ranges from −1 to +1 and tells you both the direction (positive or negative) and strength of the linear relationship. R² is always non-negative and only tells you the proportion of explained variance. If r = −0.90, then R² = 0.81 — the variables are strongly negatively correlated, and 81% of variance in y is explained by x.
Both measures assume a linear relationship. If the true relationship is curved (e.g., quadratic), linear regression will produce a low R² even if the data is perfectly predictable. Always plot your data before interpreting regression results.
Prediction vs Extrapolation
The regression equation is most reliable for predicting y values that fall within the range of x values used to build the model — this is called interpolation. Extrapolating beyond this range (predicting y for x values outside your dataset) is risky because the linear relationship may not hold there. For example, a linear model for plant growth between 10°C and 30°C may produce nonsensical predictions at 0°C or 50°C. When you need to estimate a value between two known data points without fitting a full regression model, our interpolation calculator handles linear, polynomial, and other interpolation methods directly.
Regression also relies on the correlation coefficient (Pearson r) — use that calculator to check the strength of the linear relationship between your variables before fitting a model.
Common Mistakes in Linear Regression
- Confusing correlation with causation — a strong r does not mean x causes y; a lurking variable may drive both
- Ignoring outliers — a single influential point can dramatically shift the regression line; always visualize your data
- Extrapolating too far — predictions outside the range of observed data are unreliable
- Using R² alone — a high R² does not guarantee a good model; check residual plots for patterns
- Mismatched data lengths — every x value must have a corresponding y value; missing data must be handled before regression
For a deeper look at data center, see our mean, median, and mode calculator.
Step-by-Step Example
Data: x = [1, 2, 3, 4, 5], y = [2, 4, 5, 4, 5]
- n = 5, Σx = 15, Σy = 20, Σxy = 65, Σx² = 55
- Slope: m = [5(65) − (15)(20)] / [5(55) − 15²] = [325 − 300] / [275 − 225] = 25/50 = 0.50
- Intercept: b = (20 − 0.50 × 15) / 5 = (20 − 7.5) / 5 = 2.50
- Equation: y = 0.50x + 2.50
- Prediction for x = 6: y = 0.50(6) + 2.50 = 5.50
Sources & References
- Simple linear regression — Wikipedia
- Pearson correlation coefficient — Wikipedia
- Regression and Correlation — Khan Academy