Introduction to Principal Component Analysis (PCA)

Thiago G. Martins

Principal component analysis (PCA) is a dimensionality reduction technique that is widely used in data analysis. Reducing the dimensionality of a dataset can be useful in different ways. For example, our ability to visualize data is limited to 2 or 3 dimensions. Lower dimension can sometimes significantly reduce the computational time of some numerical algorithms. Besides, many statistical models suffer from high correlation between covariates, and PCA can be used to produce linear combinations of the covariates that are uncorrelated between each other.

More technically …

Assume you have $latex {n}&fg=000000$ observations of $latex {p}&fg=000000$ different variables. Define $latex {X}&fg=000000$ to be a $latex {(n times p)}&fg=000000$ matrix where the $latex {i}&fg=000000$-th column of $latex {X}&fg=000000$ contains the observations of the $latex {i}&fg=000000$-th variable, $latex {i = 1, …, p}&fg=000000$. Each row $latex {x_i}&fg=000000$ of $latex {X}&fg=000000$ can be represented as a point in a $latex {p}&fg=000000$-dimensional space. Therefore, $latex…

View original post 420 more words