Types
PCA_Detailed[T] = object n_observations*: int n_features*: int n_components*: int projected*: Tensor[T] components*: Tensor[T] mean*: Tensor[T] explained_variance*: Tensor[T] explained_variance_ratio*: Tensor[T] singular_values*: Tensor[T] noise_variance*: T
-
Principal Component Analysis (PCA) object with full details
Contains the full PCA details from an input matrix of shape n_observations, n_features
- n_observations: The number of observations/samples from an input matrix of shape n_observations, n_features
- n_features: The number of features from the input matrix of shape n_observations, n_features
- n_components: The number of principal components asked in pca_detailed
- projected: The result of the PCA of shape n_observations, n_components in descending order of explained variance
- components: a matrix of shape n_features, n_components to project new data on the same orthogonal basis
- mean: Per-feature empirical mean, equal to input.mean(axis=0)
- explained_variance: a vector of shape n_components in descending order. Represents the amount of variance explained by each components It is equal to n_components largest eigenvalues of the covariance matrix of X.
- explained_variance_ratio: a vector of shape n_components in descending order. Represents the percentage of variance explained by each components
- singular_values: a vector of shape n_components in descending order. The singular values corresponding to each components. The singular values are equal to the 2-norms of the n_components cariables in the lower-dimensional space
- noise_variance: The estimated noise covariance following the Probabilistic PCA model from Tipping and Bishop 1999. See "Pattern Recognition and Machine Learning" by C. Bishop, 12.2.1 p. 574 or http://www.miketipping.com/papers/met-mppca.pdf. It is required to compute the estimated data covariance and score samples.## Equal to the average of (min(n_features, n_samples) - n_components) smallest eigenvalues of the covariance matrix of X.
The outputs mean, explained_variance, explained_variance_ratio, singular_values are squeezed to 1D and matches the features column vectors
Source Edit
Procs
proc `$`(pca: PCA_Detailed): string
- Source Edit
proc pca[T: SomeFloat](X: Tensor[T]; n_components = 2; center: static bool = true; n_oversamples = 5; n_power_iters = 2): tuple[projected: Tensor[T], components: Tensor[T]] {.noinit.}
-
Principal Component Analysis (PCA)
Project the input data X of shape Observations, Features into a new coordinate system where axes (principal components) are in descending order of explained variance of the original X data i.e. the first axis explains most of the variance.
The rotated components cmatrix can be used to project new observations onto the same base: X' * loadings, with X' of shape Observations', Features. X' must be mean centered Its transposed can be use to reconstruct the original X: X ~= projected * components.transpose()
PCA requires:
- mean-centered features. This procedure does the centering by default. You can pass "center = false", if your preprocessing leads to centering.
- Features of the same scale/amplitude. Some alternatives include min-max scaling, mean normalization, standardization (mean = 0 and unit variance), rescaling column to unit-length.
Note: PCA without centering is also called truncated SVD, which is useful when centering is costly, for example in the case of sparse matrices from parsing text.
Inputs:
- A matrix of shape Nb of observations, Nb of features
- The number of components to keep (default 2D for 2D projection)
Returns:
- A tuple of PCA projected matrix and principal components matrix: projected: a matrix of shape Nb of observations, Nb of components in descending order of explained variance components: a matrix of shape Nb of features, Nb of components to project new data on the same orthogonal basis
proc pca_detailed[T: SomeFloat](X: Tensor[T]; n_components = 2; center: static bool = true; n_oversamples = 5; n_power_iters = 2): PCA_Detailed[T] {.noinit.}
-
Principal Component Analysis (PCA) with full details
Project the input data X of shape Observations, Features into a new coordinate system where axes (principal components) are in descending order of explained variance of the original X data i.e. the first axis explains most of the variance.
The rotated components cmatrix can be used to project new observations onto the same base: X' * loadings, with X' of shape Observations', Features. X' must be mean centered Its transposed can be use to reconstruct the original X: X ~= projected * components.transpose()
PCA requires:
- mean-centered features. This procedure does the centering by default. You can pass "center = false", if your preprocessing leads to centering.
- Features of the same scale/amplitude. Some alternatives include min-max scaling, mean normalization, standardization (mean = 0 and unit variance), rescaling column to unit-length.
Note: PCA without centering is also called truncated SVD, which is useful when centering is costly, for example in the case of sparse matrices from parsing text.
Inputs:
- A matrix of shape Nb of observations, Nb of features
- The number of components to keep (default 2D for 2D projection)
Returns a "Principal Component Analysis" object with the following fields
- n_observations: The number of observations/samples from an input matrix of shape n_observations, n_features
- n_features: The number of features from the input matrix of shape n_observations, n_features
- n_components: The number of principal components asked in pca_detailed
- projected: The result of the PCA of shape n_observations, n_components in descending order of explained variance
- components: a matrix of shape n_features, n_components to project new data on the same orthogonal basis
- mean: Per-feature empirical mean, equal to input.mean(axis=0)
- explained_variance: a vector of shape n_components in descending order. Represents the amount of variance explained by each components It is equal to n_components largest eigenvalues of the covariance matrix of X.
- explained_variance_ratio: a vector of shape n_components in descending order. Represents the percentage of variance explained by each components
- singular_values: a vector of shape n_components in descending order. The singular values corresponding to each components. The singular values are equal to the 2-norms of the n_components cariables in the lower-dimensional space
- noise_variance: The estimated noise covariance following the Probabilistic PCA model from Tipping and Bishop 1999. See "Pattern Recognition and Machine Learning" by C. Bishop, 12.2.1 p. 574 or http://www.miketipping.com/papers/met-mppca.pdf. It is required to compute the estimated data covariance and score samples.## Equal to the average of (min(n_features, n_samples) - n_components) smallest eigenvalues of the covariance matrix of X.
The outputs mean, explained_variance, explained_variance_ratio, singular_values are squeezed to 1D and matches the features column vectors
Source Edit