Fork me on GitHub

src/arraymancer/ml/dimensionality_reduction/pca

  Source Edit

Types

PCA_Detailed[T] = object
  n_observations*: int
  n_features*: int
  n_components*: int
  projected*: Tensor[T]
  components*: Tensor[T]
  mean*: Tensor[T]
  explained_variance*: Tensor[T]
  explained_variance_ratio*: Tensor[T]
  singular_values*: Tensor[T]
  noise_variance*: T

Principal Component Analysis (PCA) object with full details

Contains the full PCA details from an input matrix of shape n_observations, n_features

  • n_observations: The number of observations/samples from an input matrix of shape n_observations, n_features
  • n_features: The number of features from the input matrix of shape n_observations, n_features
  • n_components: The number of principal components asked in pca_detailed
  • projected: The result of the PCA of shape n_observations, n_components in descending order of explained variance
  • components: a matrix of shape n_features, n_components to project new data on the same orthogonal basis
  • mean: Per-feature empirical mean, equal to input.mean(axis=0)
  • explained_variance: a vector of shape n_components in descending order. Represents the amount of variance explained by each components It is equal to n_components largest eigenvalues of the covariance matrix of X.
  • explained_variance_ratio: a vector of shape n_components in descending order. Represents the percentage of variance explained by each components
  • singular_values: a vector of shape n_components in descending order. The singular values corresponding to each components. The singular values are equal to the 2-norms of the n_components cariables in the lower-dimensional space
  • noise_variance: The estimated noise covariance following the Probabilistic PCA model from Tipping and Bishop 1999. See "Pattern Recognition and Machine Learning" by C. Bishop, 12.2.1 p. 574 or http://www.miketipping.com/papers/met-mppca.pdf. It is required to compute the estimated data covariance and score samples.## Equal to the average of (min(n_features, n_samples) - n_components) smallest eigenvalues of the covariance matrix of X.

The outputs mean, explained_variance, explained_variance_ratio, singular_values are squeezed to 1D and matches the features column vectors

  Source Edit

Procs

proc `$`(pca: PCA_Detailed): string
  Source Edit
proc pca[T: SomeFloat](X: Tensor[T]; n_components = 2;
                       center: static bool = true; n_oversamples = 5;
                       n_power_iters = 2): tuple[projected: Tensor[T],
    components: Tensor[T]] {.noinit.}

Principal Component Analysis (PCA)

Project the input data X of shape Observations, Features into a new coordinate system where axes (principal components) are in descending order of explained variance of the original X data i.e. the first axis explains most of the variance.

The rotated components cmatrix can be used to project new observations onto the same base: X' * loadings, with X' of shape Observations', Features. X' must be mean centered Its transposed can be use to reconstruct the original X: X ~= projected * components.transpose()

PCA requires:

  • mean-centered features. This procedure does the centering by default. You can pass "center = false", if your preprocessing leads to centering.
  • Features of the same scale/amplitude. Some alternatives include min-max scaling, mean normalization, standardization (mean = 0 and unit variance), rescaling column to unit-length.

Note: PCA without centering is also called truncated SVD, which is useful when centering is costly, for example in the case of sparse matrices from parsing text.

Inputs:

Returns:

  Source Edit
proc pca_detailed[T: SomeFloat](X: Tensor[T]; n_components = 2;
                                center: static bool = true; n_oversamples = 5;
                                n_power_iters = 2): PCA_Detailed[T] {.noinit.}

Principal Component Analysis (PCA) with full details

Project the input data X of shape Observations, Features into a new coordinate system where axes (principal components) are in descending order of explained variance of the original X data i.e. the first axis explains most of the variance.

The rotated components cmatrix can be used to project new observations onto the same base: X' * loadings, with X' of shape Observations', Features. X' must be mean centered Its transposed can be use to reconstruct the original X: X ~= projected * components.transpose()

PCA requires:

  • mean-centered features. This procedure does the centering by default. You can pass "center = false", if your preprocessing leads to centering.
  • Features of the same scale/amplitude. Some alternatives include min-max scaling, mean normalization, standardization (mean = 0 and unit variance), rescaling column to unit-length.

Note: PCA without centering is also called truncated SVD, which is useful when centering is costly, for example in the case of sparse matrices from parsing text.

Inputs:

Returns a "Principal Component Analysis" object with the following fields

  • n_observations: The number of observations/samples from an input matrix of shape n_observations, n_features
  • n_features: The number of features from the input matrix of shape n_observations, n_features
  • n_components: The number of principal components asked in pca_detailed
  • projected: The result of the PCA of shape n_observations, n_components in descending order of explained variance
  • components: a matrix of shape n_features, n_components to project new data on the same orthogonal basis
  • mean: Per-feature empirical mean, equal to input.mean(axis=0)
  • explained_variance: a vector of shape n_components in descending order. Represents the amount of variance explained by each components It is equal to n_components largest eigenvalues of the covariance matrix of X.
  • explained_variance_ratio: a vector of shape n_components in descending order. Represents the percentage of variance explained by each components
  • singular_values: a vector of shape n_components in descending order. The singular values corresponding to each components. The singular values are equal to the 2-norms of the n_components cariables in the lower-dimensional space
  • noise_variance: The estimated noise covariance following the Probabilistic PCA model from Tipping and Bishop 1999. See "Pattern Recognition and Machine Learning" by C. Bishop, 12.2.1 p. 574 or http://www.miketipping.com/papers/met-mppca.pdf. It is required to compute the estimated data covariance and score samples.## Equal to the average of (min(n_features, n_samples) - n_components) smallest eigenvalues of the covariance matrix of X.

The outputs mean, explained_variance, explained_variance_ratio, singular_values are squeezed to 1D and matches the features column vectors

  Source Edit
Arraymancer Technical reference Tutorial Spellbook (How-To's) Under the hood