Source Edit

Imports

Types

PCA_Detailed[T] = object
  n_observations*: int
  n_features*: int
  n_components*: int
  projected*: Tensor[T]
  components*: Tensor[T]
  mean*: Tensor[T]
  explained_variance*: Tensor[T]
  explained_variance_ratio*: Tensor[T]
  singular_values*: Tensor[T]
  noise_variance*: T

Principal Component Analysis (PCA) object with full details

Contains the full PCA details from an input matrix of shape n_observations, n_features

n_observations: The number of observations/samples from an input matrix of shape n_observations, n_features
n_features: The number of features from the input matrix of shape n_observations, n_features
n_components: The number of principal components asked in pca_detailed
projected: The result of the PCA of shape n_observations, n_components in descending order of explained variance
components: a matrix of shape n_features, n_components to project new data on the same orthogonal basis
mean: Per-feature empirical mean, equal to input.mean(axis=0)
explained_variance: a vector of shape n_components in descending order. Represents the amount of variance explained by each components It is equal to n_components largest eigenvalues of the covariance matrix of X.
explained_variance_ratio: a vector of shape n_components in descending order. Represents the percentage of variance explained by each components
singular_values: a vector of shape n_components in descending order. The singular values corresponding to each components. The singular values are equal to the 2-norms of the n_components cariables in the lower-dimensional space
noise_variance: The estimated noise covariance following the Probabilistic PCA model from Tipping and Bishop 1999. See "Pattern Recognition and Machine Learning" by C. Bishop, 12.2.1 p. 574 or http://www.miketipping.com/papers/met-mppca.pdf. It is required to compute the estimated data covariance and score samples.## Equal to the average of (min(n_features, n_samples) - n_components) smallest eigenvalues of the covariance matrix of X.

The outputs mean, explained_variance, explained_variance_ratio, singular_values are squeezed to 1D and matches the features column vectors

Source Edit

Procs

proc `$`(pca: PCA_Detailed): string: Source Edit
proc pca[T: SomeFloat](X: Tensor[T]; n_components = 2; center: static bool = true; n_oversamples = 5; n_power_iters = 2): tuple[projected: Tensor[T], components: Tensor[T]] {.noinit.}: Principal Component Analysis (PCA)

Project the input data X of shape Observations, Features into a new coordinate system where axes (principal components) are in descending order of explained variance of the original X data i.e. the first axis explains most of the variance.

The rotated components cmatrix can be used to project new observations onto the same base: X' * loadings, with X' of shape Observations', Features. X' must be mean centered Its transposed can be use to reconstruct the original X: X ~= projected * components.transpose()

PCA requires:

mean-centered features. This procedure does the centering by default. You can pass "center = false", if your preprocessing leads to centering.

Features of the same scale/amplitude. Some alternatives include min-max scaling, mean normalization, standardization (mean = 0 and unit variance), rescaling column to unit-length.

Note: PCA without centering is also called truncated SVD, which is useful when centering is costly, for example in the case of sparse matrices from parsing text.

Inputs:

A matrix of shape Nb of observations, Nb of features

The number of components to keep (default 2D for 2D projection)

Returns:

A tuple of PCA projected matrix and principal components matrix: projected: a matrix of shape Nb of observations, Nb of components in descending order of explained variance components: a matrix of shape Nb of features, Nb of components to project new data on the same orthogonal basis

Source Edit
proc pca_detailed[T: SomeFloat](X: Tensor[T]; n_components = 2; center: static bool = true; n_oversamples = 5; n_power_iters = 2): PCA_Detailed[T] {.noinit.}: Principal Component Analysis (PCA) with full details

Project the input data X of shape Observations, Features into a new coordinate system where axes (principal components) are in descending order of explained variance of the original X data i.e. the first axis explains most of the variance.

The rotated components cmatrix can be used to project new observations onto the same base: X' * loadings, with X' of shape Observations', Features. X' must be mean centered Its transposed can be use to reconstruct the original X: X ~= projected * components.transpose()

PCA requires:

mean-centered features. This procedure does the centering by default. You can pass "center = false", if your preprocessing leads to centering.

Features of the same scale/amplitude. Some alternatives include min-max scaling, mean normalization, standardization (mean = 0 and unit variance), rescaling column to unit-length.

Note: PCA without centering is also called truncated SVD, which is useful when centering is costly, for example in the case of sparse matrices from parsing text.

Inputs:

A matrix of shape Nb of observations, Nb of features

The number of components to keep (default 2D for 2D projection)

Returns a "Principal Component Analysis" object with the following fields

n_observations: The number of observations/samples from an input matrix of shape n_observations, n_features

n_features: The number of features from the input matrix of shape n_observations, n_features

n_components: The number of principal components asked in pca_detailed

projected: The result of the PCA of shape n_observations, n_components in descending order of explained variance

components: a matrix of shape n_features, n_components to project new data on the same orthogonal basis

mean: Per-feature empirical mean, equal to input.mean(axis=0)

explained_variance: a vector of shape n_components in descending order. Represents the amount of variance explained by each components It is equal to n_components largest eigenvalues of the covariance matrix of X.

explained_variance_ratio: a vector of shape n_components in descending order. Represents the percentage of variance explained by each components

singular_values: a vector of shape n_components in descending order. The singular values corresponding to each components. The singular values are equal to the 2-norms of the n_components cariables in the lower-dimensional space

noise_variance: The estimated noise covariance following the Probabilistic PCA model from Tipping and Bishop 1999. See "Pattern Recognition and Machine Learning" by C. Bishop, 12.2.1 p. 574 or http://www.miketipping.com/papers/met-mppca.pdf. It is required to compute the estimated data covariance and score samples.## Equal to the average of (min(n_features, n_samples) - n_components) smallest eigenvalues of the covariance matrix of X.

The outputs mean, explained_variance, explained_variance_ratio, singular_values are squeezed to 1D and matches the features column vectors
Source Edit

src/arraymancer/ml/dimensionality_reduction/pca

Imports

Types

Procs