|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.unihalle.informatik.Alida.operator.ALDOperator
de.unihalle.informatik.MiToBo.core.operator.MTBOperator
de.unihalle.informatik.MiToBo.math.statistics.PCA
@ALDAOperator(genericExecutionMode=ALL, level=STANDARD, allowBatchMode=false) public class PCA
This class implements the Karhunen-Loeve transformation, also known as PCA.
Given a data matrix A where each column contains a data vector, first the
covariance matrix of the data, i.e., , is
calculated. Then the eigenvalues and eigenvectors of this matrix are
computed according to
In case that the dimensionality of the data is larger than the available
number of samples, i.e., the input data matrix has more row than columns,
the calculations are simplified by using the matrix
instead of the covariance matrix which is
larger in this case. For the eigenvectors
and values of this matrix the following equation holds:
Nested Class Summary | |
---|---|
static class |
PCA.ReductionMode
Available modes for determining the sub-space dimensionality. |
Nested classes/interfaces inherited from class de.unihalle.informatik.Alida.operator.ALDOperator |
---|
de.unihalle.informatik.Alida.operator.ALDOperator.HidingMode |
Field Summary | |
---|---|
protected Jama.Matrix |
C
Covariance matrix calculated from mean-free data. |
protected int |
dataDim
Dimensionality of the input data. |
protected double[] |
eigenVals
Set of computed eigenvalues. |
protected Jama.Matrix |
eigenVects
Matrix of eigenvectors, each column containing a vector. |
protected double[] |
mean
Average vector of input dataset. |
protected double[][] |
meanfreeData
Normalized, i.e., mean-free, dataset. |
protected Jama.Matrix |
meanfreeDataMatrix
Normalized, i.e., mean-free, data matrix. |
protected Jama.Matrix |
P_t
The final transformation matrix to be used for dimension reduction. |
protected int |
sampleCount
Number of data samples in input data. |
protected int |
subDim
Dimensionality of the sub-space as either specified by the user or automatically determined based on the percentage of variance. |
Fields inherited from class de.unihalle.informatik.Alida.operator.ALDOperator |
---|
completeDAG, name, operatorExecutionEventlistenerList, portHashAccess, verbose, versionProvider |
Constructor Summary | |
---|---|
PCA()
Default constructor. |
Method Summary | |
---|---|
protected void |
calculateCovarianceMatrixAndEigenstuff()
Calculates covariance matrix and eigenvalues and -vectors. |
protected void |
calculateMeanFreeData()
Computes the average data vector and makes data mean-free. |
protected void |
determineSubspaceDimension()
Determines desired sub-space dimensionality according to selected mode. |
protected void |
doDimensionReduction()
Does the actual dimension reduction by data projection into sub-space. |
protected void |
examineDataset()
Extracts number of samples and their dimension from dataset. |
double[] |
getEigenvalues()
Get calculated eigenvalues in ascending order. |
double[][] |
getEigenvects()
Get calculated eigenvectors, one vector per column, in ascending order. |
double[][] |
getResultData()
Get the transformed dataset. |
protected void |
operate()
This method does the actual work. |
void |
setDataset(double[][] ds)
Specify an input dataset. |
void |
setMeanFreeData(boolean b)
Set flag to indicate if data is already mean-free. |
void |
setNumberOfComponents(int compNum)
Number of sub-space components if reduction mode is NUMBER_COMPONENTS. |
void |
setPercentageOfVariance(double p)
Fraction of variance to be represented in the sub-space if the reduction mode is PERCENTAGE_VARIANCE. |
void |
setReductionMode(PCA.ReductionMode rm)
Specify the mode for selecting the sub-space dimensionality. |
Methods inherited from class de.unihalle.informatik.MiToBo.core.operator.MTBOperator |
---|
readResolve |
Methods inherited from class de.unihalle.informatik.Alida.operator.ALDOperator |
---|
addOperatorExecutionProgressEventListener, fieldContained, fireOperatorExecutionProgressEvent, getALDPortHashAccessKey, getConstructionMode, getHidingMode, getInInoutNames, getInInoutNames, getInNames, getInOutNames, getMissingRequiredInputs, getName, getNumParameters, getOutInoutNames, getOutNames, getParameter, getParameterDescriptor, getParameterNames, getSupplementalNames, getVerbose, getVersion, handleOperatorExecutionProgressEvent, isConfigured, print, print, print, printInterface, printInterface, readHistory, reinitializeParameterDescriptors, removeOperatorExecutionProgressEventListener, runOp, runOp, runOp, setConstructionMode, setHidingMode, setName, setParameter, setVerbose, toStringVerbose, unconfiguredItems, validate, validateCustom, validateGeneric, writeHistory, writeHistory, writeHistory |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected int dataDim
protected int sampleCount
protected transient double[] mean
protected transient double[][] meanfreeData
protected transient Jama.Matrix meanfreeDataMatrix
protected transient Jama.Matrix C
The scaling by the number of samples is omitted here as this is just a constant factor in eigenvalue and -vector calculations.
protected transient double[] eigenVals
Note that the values are in ascending order.
protected transient Jama.Matrix eigenVects
The vectors are sorted according to their eigenvalues, i.e., the vector corresponding to the largest eigenvalue can be found in the last column.
protected transient int subDim
protected transient Jama.Matrix P_t
This matrix is already transposed, i.e., each row contains a sub-space basis vector and the number of rows is equal to the dimension of the sub-space.
Constructor Detail |
---|
public PCA() throws de.unihalle.informatik.Alida.exceptions.ALDOperatorException
de.unihalle.informatik.Alida.exceptions.ALDOperatorException
Method Detail |
---|
public void setDataset(double[][] ds)
ds
- Dataset to process.public void setMeanFreeData(boolean b)
b
- If true, the input data is assumed to be mean-free already.public void setReductionMode(PCA.ReductionMode rm)
rm
- Mode for dimension reduction.public void setNumberOfComponents(int compNum)
compNum
- Number of components, i.e., eigenvectors, to use.public void setPercentageOfVariance(double p)
p
- Fraction of variance to represent in sub-space.public double[][] getResultData()
public double[] getEigenvalues()
public double[][] getEigenvects()
protected void operate()
operate
in class de.unihalle.informatik.Alida.operator.ALDOperator
protected void examineDataset()
protected void calculateMeanFreeData()
protected void calculateCovarianceMatrixAndEigenstuff()
protected void determineSubspaceDimension()
protected void doDimensionReduction()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |