de.unihalle.informatik.MiToBo.math.statistics
Class PCA

java.lang.Object
  extended by de.unihalle.informatik.Alida.operator.ALDOperator
      extended by de.unihalle.informatik.MiToBo.core.operator.MTBOperator
          extended by de.unihalle.informatik.MiToBo.math.statistics.PCA
All Implemented Interfaces:
de.unihalle.informatik.Alida.datatypes.ALDConfigurationValidator, de.unihalle.informatik.Alida.operator.events.ALDOperatorExecutionProgressEventListener, EventListener

@ALDAOperator(genericExecutionMode=ALL,
              level=STANDARD,
              allowBatchMode=false)
public class PCA
extends MTBOperator

This class implements the Karhunen-Loeve transformation, also known as PCA.

Given a data matrix A where each column contains a data vector, first the covariance matrix of the data, i.e., $A\cdot A^T$, is calculated. Then the eigenvalues and eigenvectors of this matrix are computed according to

%preamble{\usepackage{amssymb, amsmath}}
A subset of the eigenvectors corresponding to the largest eigenvalues is then selected according to the given dimension reduction mode. These are finally used to form the basis of a new vector space with reduced dimensionality. The (mean-free) input data is projected into this space for dimension reduction and yields the result of the operator.

In case that the dimensionality of the data is larger than the available number of samples, i.e., the input data matrix has more row than columns, the calculations are simplified by using the matrix $A^T\cdot A$ instead of the covariance matrix which is larger in this case. For the eigenvectors and values of this matrix the following equation holds:

%preamble{\usepackage{amssymb, amsmath}}
In detail, the eigenvectors of this matrix, denoted by $\vec{w}$ in the above equation, can be used to calculate the eigenvectors $\vec{v} = A \cdot \vec{w}$ of the covariance matrix without need for explicitly solving the problem for the larger matrix.

Author:
moeller

Nested Class Summary
static class PCA.ReductionMode
          Available modes for determining the sub-space dimensionality.
 
Nested classes/interfaces inherited from class de.unihalle.informatik.Alida.operator.ALDOperator
de.unihalle.informatik.Alida.operator.ALDOperator.HidingMode
 
Field Summary
protected  Jama.Matrix C
          Covariance matrix calculated from mean-free data.
protected  int dataDim
          Dimensionality of the input data.
protected  double[] eigenVals
          Set of computed eigenvalues.
protected  Jama.Matrix eigenVects
          Matrix of eigenvectors, each column containing a vector.
protected  double[] mean
          Average vector of input dataset.
protected  double[][] meanfreeData
          Normalized, i.e., mean-free, dataset.
protected  Jama.Matrix meanfreeDataMatrix
          Normalized, i.e., mean-free, data matrix.
protected  Jama.Matrix P_t
          The final transformation matrix to be used for dimension reduction.
protected  int sampleCount
          Number of data samples in input data.
protected  int subDim
          Dimensionality of the sub-space as either specified by the user or automatically determined based on the percentage of variance.
 
Fields inherited from class de.unihalle.informatik.Alida.operator.ALDOperator
completeDAG, name, operatorExecutionEventlistenerList, portHashAccess, verbose, versionProvider
 
Constructor Summary
PCA()
          Default constructor.
 
Method Summary
protected  void calculateCovarianceMatrixAndEigenstuff()
          Calculates covariance matrix and eigenvalues and -vectors.
protected  void calculateMeanFreeData()
          Computes the average data vector and makes data mean-free.
protected  void determineSubspaceDimension()
          Determines desired sub-space dimensionality according to selected mode.
protected  void doDimensionReduction()
          Does the actual dimension reduction by data projection into sub-space.
protected  void examineDataset()
          Extracts number of samples and their dimension from dataset.
 double[] getEigenvalues()
          Get calculated eigenvalues in ascending order.
 double[][] getEigenvects()
          Get calculated eigenvectors, one vector per column, in ascending order.
 double[][] getResultData()
          Get the transformed dataset.
protected  void operate()
          This method does the actual work.
 void setDataset(double[][] ds)
          Specify an input dataset.
 void setMeanFreeData(boolean b)
          Set flag to indicate if data is already mean-free.
 void setNumberOfComponents(int compNum)
          Number of sub-space components if reduction mode is NUMBER_COMPONENTS.
 void setPercentageOfVariance(double p)
          Fraction of variance to be represented in the sub-space if the reduction mode is PERCENTAGE_VARIANCE.
 void setReductionMode(PCA.ReductionMode rm)
          Specify the mode for selecting the sub-space dimensionality.
 
Methods inherited from class de.unihalle.informatik.MiToBo.core.operator.MTBOperator
readResolve
 
Methods inherited from class de.unihalle.informatik.Alida.operator.ALDOperator
addOperatorExecutionProgressEventListener, fieldContained, fireOperatorExecutionProgressEvent, getALDPortHashAccessKey, getConstructionMode, getHidingMode, getInInoutNames, getInInoutNames, getInNames, getInOutNames, getMissingRequiredInputs, getName, getNumParameters, getOutInoutNames, getOutNames, getParameter, getParameterDescriptor, getParameterNames, getSupplementalNames, getVerbose, getVersion, handleOperatorExecutionProgressEvent, isConfigured, print, print, print, printInterface, printInterface, readHistory, reinitializeParameterDescriptors, removeOperatorExecutionProgressEventListener, runOp, runOp, runOp, setConstructionMode, setHidingMode, setName, setParameter, setVerbose, toStringVerbose, unconfiguredItems, validate, validateCustom, validateGeneric, writeHistory, writeHistory, writeHistory
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

dataDim

protected int dataDim
Dimensionality of the input data.


sampleCount

protected int sampleCount
Number of data samples in input data.


mean

protected transient double[] mean
Average vector of input dataset.


meanfreeData

protected transient double[][] meanfreeData
Normalized, i.e., mean-free, dataset.


meanfreeDataMatrix

protected transient Jama.Matrix meanfreeDataMatrix
Normalized, i.e., mean-free, data matrix.


C

protected transient Jama.Matrix C
Covariance matrix calculated from mean-free data.

The scaling by the number of samples is omitted here as this is just a constant factor in eigenvalue and -vector calculations.


eigenVals

protected transient double[] eigenVals
Set of computed eigenvalues.

Note that the values are in ascending order.


eigenVects

protected transient Jama.Matrix eigenVects
Matrix of eigenvectors, each column containing a vector.

The vectors are sorted according to their eigenvalues, i.e., the vector corresponding to the largest eigenvalue can be found in the last column.


subDim

protected transient int subDim
Dimensionality of the sub-space as either specified by the user or automatically determined based on the percentage of variance.


P_t

protected transient Jama.Matrix P_t
The final transformation matrix to be used for dimension reduction.

This matrix is already transposed, i.e., each row contains a sub-space basis vector and the number of rows is equal to the dimension of the sub-space.

Constructor Detail

PCA

public PCA()
    throws de.unihalle.informatik.Alida.exceptions.ALDOperatorException
Default constructor.

Throws:
de.unihalle.informatik.Alida.exceptions.ALDOperatorException
Method Detail

setDataset

public void setDataset(double[][] ds)
Specify an input dataset.

Parameters:
ds - Dataset to process.

setMeanFreeData

public void setMeanFreeData(boolean b)
Set flag to indicate if data is already mean-free.

Parameters:
b - If true, the input data is assumed to be mean-free already.

setReductionMode

public void setReductionMode(PCA.ReductionMode rm)
Specify the mode for selecting the sub-space dimensionality.

Parameters:
rm - Mode for dimension reduction.

setNumberOfComponents

public void setNumberOfComponents(int compNum)
Number of sub-space components if reduction mode is NUMBER_COMPONENTS.

Parameters:
compNum - Number of components, i.e., eigenvectors, to use.

setPercentageOfVariance

public void setPercentageOfVariance(double p)
Fraction of variance to be represented in the sub-space if the reduction mode is PERCENTAGE_VARIANCE.

Parameters:
p - Fraction of variance to represent in sub-space.

getResultData

public double[][] getResultData()
Get the transformed dataset.

Returns:
Resulting dataset.

getEigenvalues

public double[] getEigenvalues()
Get calculated eigenvalues in ascending order.

Returns:
Set of eigenvalues, null if calculations are not yet completed.

getEigenvects

public double[][] getEigenvects()
Get calculated eigenvectors, one vector per column, in ascending order.

Returns:
Set of eigenvectors, null if calculations are not yet completed.

operate

protected void operate()
This method does the actual work.

Specified by:
operate in class de.unihalle.informatik.Alida.operator.ALDOperator

examineDataset

protected void examineDataset()
Extracts number of samples and their dimension from dataset.


calculateMeanFreeData

protected void calculateMeanFreeData()
Computes the average data vector and makes data mean-free.


calculateCovarianceMatrixAndEigenstuff

protected void calculateCovarianceMatrixAndEigenstuff()
Calculates covariance matrix and eigenvalues and -vectors.


determineSubspaceDimension

protected void determineSubspaceDimension()
Determines desired sub-space dimensionality according to selected mode.


doDimensionReduction

protected void doDimensionReduction()
Does the actual dimension reduction by data projection into sub-space.



Copyright © 2010–2015 Martin Luther University Halle-Wittenberg. All rights reserved.