Geometry of Regression
Geometric Interpretation of Linear Regression
A picture is worth a thousand words. This post on stats Stack Exchange gives a great description of the geometric representation of Linear Regression problems. Let’s see this in action using some simple examples.
The below graphic, which appeared in the original stack-exchange post, captures the essence of Linear Regression very aptly.
Source: Stack Exchange
Overview
The geometrical meaning of the Linear/Multiple Regression fit is the projection of predicted variable
In terms of more generally understood form of Linear Regression:
- With Constant:
- Without Constant:
We will focus on regression with constant.
Regression coefficients represent the factors that make a linear combination of
Additionally,
Analysis
1
2
3
4
5
6
7
import numpy as np
import matplotlib.pyplot as plt
import sympy as sp
%matplotlib inline
sp.init_printing(use_unicode=True)
plt.style.use("ggplot")
Let’s create our
1
2
x = np.array([1.0, 2, 3])
y = np.array([2, 2.5, 5])
1
2
Y = sp.Matrix(y)
Y
1
2
X = sp.Matrix(np.transpose([np.ones(len(x)), x]))
X
1
2
3
4
5
6
7
8
9
fig = plt.figure()
plt.scatter(X.col(1), y)
plt.xlim((0, 5))
plt.ylim((0, 6))
plt.title("Y vs X")
plt.xlabel("X")
plt.ylabel("Y")
plt.gcf().set_size_inches(10, 5)
plt.show()
Regression Coefficients
Linear regression coefficients
Let’s calculate
1
2
beta = ((X.transpose() * X) ** -1) * X.transpose() * y
beta
Since we now have
1
2
y_hat = X * beta
y_hat
1
2
3
4
5
6
7
8
9
10
fig = plt.figure()
plt.scatter(x, y)
plt.xlim((0, 5))
plt.ylim((0, 6))
plt.title("Y vs X | Regression Fit")
plt.xlabel("X")
plt.ylabel("Y")
plt.plot(X.col(1), y_hat, color='blue')
plt.gcf().set_size_inches(10, 5)
plt.show()
Error Analysis
Residuals for the model are given by:
1
2
res = y - y_hat
res
Average vector or
1
2
y_bar = np.mean(y) * sp.Matrix(np.ones(len(y)))
y_bar
We can calculate the error in the average model or where we represent the predicted values as the average vector
1
2
kappa = y_bar - y
kappa
Both
1
2
eta = y_hat - y_bar
eta
Now from here we can prove whether
1
2
dot_product = eta.transpose() * res
dot_product
Hence, we can see that
From here we can also prove the relationship between Total Sum of Squares (SST), Sum of Squares due to Squares of Regression (SSR) and Sum of Squares due to Squares of Errors (SSE)
can be represented by the squared norm of can be represented by the squared norm of can be represented by the squared norm of
We can use Pythagorean Theorem to check the above relationship i.e.
1
kappa.norm() ** 2 - eta.norm() ** 2 - res.norm() ** 2
Hence, as we expected,
Summary
Through this post, I demonstrated how we can interpret linear/multiple regression geometrically.
Also, I solved a linear regression model using Linear Algebra.
Comments powered by Disqus.