Fork me on GitHub


  Source Edit


PCA_Detailed[T] = object
  n_observations*: int
  n_features*: int
  n_components*: int
  projected*: Tensor[T]
  components*: Tensor[T]
  mean*: Tensor[T]
  explained_variance*: Tensor[T]
  explained_variance_ratio*: Tensor[T]
  singular_values*: Tensor[T]
  noise_variance*: T

Principal Component Analysis (PCA) object with full details

Contains the full PCA details from an input matrix of shape n_observations, n_features

  • n_observations: The number of observations/samples from an input matrix of shape n_observations, n_features
  • n_features: The number of features from the input matrix of shape n_observations, n_features
  • n_components: The number of principal components asked in pca_detailed
  • projected: The result of the PCA of shape n_observations, n_components in descending order of explained variance
  • components: a matrix of shape n_features, n_components to project new data on the same orthogonal basis
  • mean: Per-feature empirical mean, equal to input.mean(axis=0)
  • explained_variance: a vector of shape n_components in descending order. Represents the amount of variance explained by each components It is equal to n_components largest eigenvalues of the covariance matrix of X.
  • explained_variance_ratio: a vector of shape n_components in descending order. Represents the percentage of variance explained by each components
  • singular_values: a vector of shape n_components in descending order. The singular values corresponding to each components. The singular values are equal to the 2-norms of the n_components cariables in the lower-dimensional space
  • noise_variance: The estimated noise covariance following the Probabilistic PCA model from Tipping and Bishop 1999. See "Pattern Recognition and Machine Learning" by C. Bishop, 12.2.1 p. 574 or It is required to compute the estimated data covariance and score samples.## Equal to the average of (min(n_features, n_samples) - n_components) smallest eigenvalues of the covariance matrix of X.

The outputs mean, explained_variance, explained_variance_ratio, singular_values are squeezed to 1D and matches the features column vectors

  Source Edit


proc `$`(pca: PCA_Detailed): string
  Source Edit
proc pca[T: SomeFloat](X: Tensor[T]; n_components = 2;
                       center: static bool = true; n_oversamples = 5;
                       n_power_iters = 2): tuple[projected: Tensor[T],
    components: Tensor[T]] {.noinit.}

Principal Component Analysis (PCA)

Project the input data X of shape Observations, Features into a new coordinate system where axes (principal components) are in descending order of explained variance of the original X data i.e. the first axis explains most of the variance.

The rotated components cmatrix can be used to project new observations onto the same base: X' * loadings, with X' of shape Observations', Features. X' must be mean centered Its transposed can be use to reconstruct the original X: X ~= projected * components.transpose()

PCA requires:

  • mean-centered features. This procedure does the centering by default. You can pass "center = false", if your preprocessing leads to centering.
  • Features of the same scale/amplitude. Some alternatives include min-max scaling, mean normalization, standardization (mean = 0 and unit variance), rescaling column to unit-length.

Note: PCA without centering is also called truncated SVD, which is useful when centering is costly, for example in the case of sparse matrices from parsing text.



  Source Edit
proc pca_detailed[T: SomeFloat](X: Tensor[T]; n_components = 2;
                                center: static bool = true; n_oversamples = 5;
                                n_power_iters = 2): PCA_Detailed[T] {.noinit.}

Principal Component Analysis (PCA) with full details

Project the input data X of shape Observations, Features into a new coordinate system where axes (principal components) are in descending order of explained variance of the original X data i.e. the first axis explains most of the variance.

The rotated components cmatrix can be used to project new observations onto the same base: X' * loadings, with X' of shape Observations', Features. X' must be mean centered Its transposed can be use to reconstruct the original X: X ~= projected * components.transpose()

PCA requires:

  • mean-centered features. This procedure does the centering by default. You can pass "center = false", if your preprocessing leads to centering.
  • Features of the same scale/amplitude. Some alternatives include min-max scaling, mean normalization, standardization (mean = 0 and unit variance), rescaling column to unit-length.

Note: PCA without centering is also called truncated SVD, which is useful when centering is costly, for example in the case of sparse matrices from parsing text.


Returns a "Principal Component Analysis" object with the following fields

  • n_observations: The number of observations/samples from an input matrix of shape n_observations, n_features
  • n_features: The number of features from the input matrix of shape n_observations, n_features
  • n_components: The number of principal components asked in pca_detailed
  • projected: The result of the PCA of shape n_observations, n_components in descending order of explained variance
  • components: a matrix of shape n_features, n_components to project new data on the same orthogonal basis
  • mean: Per-feature empirical mean, equal to input.mean(axis=0)
  • explained_variance: a vector of shape n_components in descending order. Represents the amount of variance explained by each components It is equal to n_components largest eigenvalues of the covariance matrix of X.
  • explained_variance_ratio: a vector of shape n_components in descending order. Represents the percentage of variance explained by each components
  • singular_values: a vector of shape n_components in descending order. The singular values corresponding to each components. The singular values are equal to the 2-norms of the n_components cariables in the lower-dimensional space
  • noise_variance: The estimated noise covariance following the Probabilistic PCA model from Tipping and Bishop 1999. See "Pattern Recognition and Machine Learning" by C. Bishop, 12.2.1 p. 574 or It is required to compute the estimated data covariance and score samples.## Equal to the average of (min(n_features, n_samples) - n_components) smallest eigenvalues of the covariance matrix of X.

The outputs mean, explained_variance, explained_variance_ratio, singular_values are squeezed to 1D and matches the features column vectors

  Source Edit
Arraymancer Technical reference Tutorial Spellbook (How-To's) Under the hood