What is linear regression?

Linear regression is a statistical method for modeling the relationship between a dependent variable (y) and one independent variable (x) using a straight line: y = mx + b. The goal is to find the line that minimizes the sum of squared vertical distances from each data point to the line — this is called the ordinary least squares (OLS) method. It is used to describe relationships, make predictions, and understand how much one variable changes when another does.

How do you calculate the regression line?

The slope is m = [n·Σxy − Σx·Σy] / [n·Σx² − (Σx)²] and the intercept is b = (Σy − m·Σx) / n. These formulas use the sums of x values, y values, products xy, and squared x values. For example, with five data points you compute five sums and plug them into the formulas. Most statistical software and this calculator do this automatically — you just need to supply the (x, y) pairs.

What is R² (coefficient of determination)?

R² measures the proportion of variance in y explained by the regression line. It ranges from 0 to 1: R² = 0.85 means the line explains 85% of the variability in y. R² is calculated as 1 − SSres/SStot, where SSres is the sum of squared residuals and SStot is the total sum of squares. A higher R² generally indicates a better fit, but it does not indicate causation, and adding more variables always increases R² even if they are useless.

What is the difference between correlation and regression?

Correlation (Pearson r) measures the strength and direction of the linear relationship between x and y — it ranges from −1 to +1 and is symmetric (r between x and y equals r between y and x). Regression finds the equation of the best-fit line and is used for prediction — it is directional (predicting y from x is different from predicting x from y). R² = r² for simple linear regression, so a correlation of 0.9 gives R² = 0.81.

How do you predict values using the regression equation?

Once you have the equation y = mx + b, substitute the new x value and solve for y. For example, if the regression equation is y = 2.3x + 1.5, then for x = 10 you predict y = 2.3(10) + 1.5 = 24.5. This is called interpolation when the new x falls within the range of your data, and extrapolation when it falls outside. Extrapolation is risky — the linear relationship may not hold beyond the observed range.

What does the slope mean in regression?

The slope (m) represents the average change in y for each one-unit increase in x, holding everything else constant. If m = 3.2, then y increases by 3.2 units for every 1-unit increase in x. A positive slope indicates a positive relationship; a negative slope indicates an inverse relationship. The slope has units — if x is in hours and y is in dollars, then m is in dollars per hour.

What are common real-world uses of linear regression?

Linear regression is everywhere: predicting house prices from square footage, forecasting sales from advertising spend, estimating crop yields from rainfall, predicting exam scores from study hours, and calibrating instruments (measuring one quantity against a known standard). In finance, the CAPM model uses linear regression to compute beta (a stock's sensitivity to market returns). In medicine, regression predicts blood pressure from body weight or drug dosage from patient size.

What does a good R² value look like?

What counts as a "good" R² depends entirely on the field. In physics, R² > 0.99 is expected because physical laws are deterministic. In economics, R² of 0.7 may be strong. In social science, R² of 0.4 can be meaningful because human behavior is complex and noisy. Never judge R² in isolation — a high R² with a non-linear pattern is misleading, and a low R² can still produce a useful model if the slope is statistically significant and practically meaningful.

What assumptions does linear regression make?

Linear regression assumes: (1) linearity — the relationship between x and y is approximately linear; (2) independence — observations are independent of each other; (3) homoscedasticity — the variance of residuals is constant across all x values; (4) normality of residuals — residuals are approximately normally distributed. Violations of these assumptions reduce the reliability of predictions and p-values. Always check a residual plot before trusting a regression model.

Linear Regression Calculator

Finds the regression equation y = mx + b, Pearson r, and R² from any dataset, with prediction for new x values.

How to Use the Linear Regression Calculator

This linear regression calculator takes comma-separated x and y values — both lists must have the same number of entries. The calculator computes the least-squares regression line, slope, y-intercept, Pearson r, and R² instantly. Optionally enter a new x value to get a predicted y from the fitted equation.

For example, to analyze how study hours (x) predict exam scores (y), enter 1, 2, 3, 4, 5 as x and 55, 65, 70, 80, 90 as y. See our standard deviation calculator to first understand the spread of each variable before fitting a regression.

The Least Squares Regression Formulas

Simple linear regression finds the line y = mx + b that minimizes the sum of squared residuals (SSres). The formulas are:

Slope: m = [n·Σxy − Σx·Σy] / [n·Σx² − (Σx)²]
Intercept: b = (Σy − m·Σx) / n
R²: 1 − SSres / SStot
Pearson r: sign(m) × √R²

Where n is the number of data points, Σxy is the sum of each x·y product, and SStot is the total sum of squared deviations from the mean of y.

Interpreting R²

R² = 0.00 means the regression line explains nothing; R² = 1.00 means it explains everything perfectly. Common benchmarks: R² < 0.30 is weak, 0.30–0.70 is moderate, and > 0.70 is strong — though these thresholds depend heavily on the field. Social science data often has R² around 0.20–0.40, while engineering data may routinely exceed 0.95.

AdvertisementResponsive Ad

Pearson r vs R²

For simple linear regression, R² equals r² (Pearson correlation squared). The correlation r ranges from −1 to +1 and tells you both the direction (positive or negative) and strength of the linear relationship. R² is always non-negative and only tells you the proportion of explained variance. If r = −0.90, then R² = 0.81 — the variables are strongly negatively correlated, and 81% of variance in y is explained by x.

Both measures assume a linear relationship. If the true relationship is curved (e.g., quadratic), linear regression will produce a low R² even if the data is perfectly predictable. Always plot your data before interpreting regression results.

Prediction vs Extrapolation

The regression equation is most reliable for predicting y values that fall within the range of x values used to build the model — this is called interpolation. Extrapolating beyond this range (predicting y for x values outside your dataset) is risky because the linear relationship may not hold there. For example, a linear model for plant growth between 10°C and 30°C may produce nonsensical predictions at 0°C or 50°C. When you need to estimate a value between two known data points without fitting a full regression model, our interpolation calculator handles linear, polynomial, and other interpolation methods directly.

Regression also relies on the correlation coefficient (Pearson r) — use that calculator to check the strength of the linear relationship between your variables before fitting a model.

Common Mistakes in Linear Regression

Confusing correlation with causation — a strong r does not mean x causes y; a lurking variable may drive both
Ignoring outliers — a single influential point can dramatically shift the regression line; always visualize your data
Extrapolating too far — predictions outside the range of observed data are unreliable
Using R² alone — a high R² does not guarantee a good model; check residual plots for patterns
Mismatched data lengths — every x value must have a corresponding y value; missing data must be handled before regression

For a deeper look at data center, see our mean, median, and mode calculator.

AdvertisementResponsive Ad

Step-by-Step Example

Data: x = [1, 2, 3, 4, 5], y = [2, 4, 5, 4, 5]

n = 5, Σx = 15, Σy = 20, Σxy = 65, Σx² = 55
Slope: m = [5(65) − (15)(20)] / [5(55) − 15²] = [325 − 300] / [275 − 225] = 25/50 = 0.50
Intercept: b = (20 − 0.50 × 15) / 5 = (20 − 7.5) / 5 = 2.50
Equation: y = 0.50x + 2.50
Prediction for x = 6: y = 0.50(6) + 2.50 = 5.50

Sources & References

Simple linear regression — Wikipedia
Pearson correlation coefficient — Wikipedia
Regression and Correlation — Khan Academy

Linear Regression Calculator