# Analytical vs Gradient Descent methods of solving linear regression

The Gradient Descent offers an iterative method to solve linear models. However, there is a traditional and direct way of solving it called as **normal equations**. In normal equations, you build a matrix where each record of observation becomes a row (`m`

rows) and each feature becomes a column. You prefix an additional column to represent the constant (`n+1`

columns). This matrix, represented as `X`

is of dimension `m x (n+1)`

. You represent the response variable as a vector `y`

of dimension `m x 1`

.

The formula to calculate the optimal coefficients is given by $\theta = (X^{T}X)^{-1}X^{T}y$. Where $\theta$ is a vector of shape `n+1`

containing $[\theta_{0}, \theta_{1} … \theta_{n}]$.

### Caveats when applying analytical technique¶

- In the analytical, normal equation method, there is no iteration to arrive at optimal $\theta$. You simply calculate it.
- You do not have to scale features. It is ok to have them in their native dimensions.

### Guidelines for choosing between GD and Normal equation¶

- GD needs you to play with $\alpha$ (learning rate), while normal equation does not.
- GD is an iterative process, while normal eq is not.
- GD shines well when you have a
**large number of attributes / features / independent variables**. The order of GD is given by $O(kn^{2})$ for`n`

features. - Normal equation needs to invert a matrix which is an expensive operation. Its time complexity is given by $O(n^{3})$.
- If you have
`>10,000`

independent variables, or if the number of observations / rows is less than number of independent variables (`m < (n+1)`

), then normal equation not produce a matrix that is invertible. You are better off with Gradient Descent regression. - If you have highly correlated features (multi-collinearity) or when you have more features than observations, you might end up with a non-invertible matrix for the normal equation. In these cases, you can choose GD or you can delete some features or regularization techniques if you want to continue with normal equation.
- GD is an approximation technique, while normal equation is a deterministic approach. GD might settle in a local minima and not global minima. Although, for linear regressions, the shape of the loss function is such that there is no local but only a global minima.