Monday, May 30, 2011

Principal Component Analysis matlab code

The Principal Component Analysis (PCA) is one of the most successful techniques that have been used in image recognition and compression. PCA is a statistical method under the broad title of factor analysis. The purpose of PCA is to reduce the large dimensionality of the data space (observed variables) to the smaller intrinsic dimensionality of feature space (independent variables), which are needed to describe the data economically. This is the case when there is a strong correlation between observed variables.

The jobs which PCA can do are prediction, redundancy removal, feature extraction, data compression, etc. Because PCA is a classical technique which can do something in the linear domain, applications having linear models are suitable, such as signal processing, image processing, system and control theory, communications, etc



WHEN PUBLISHING A PAPER AS A RESULT OF RESEARCH CONDUCTED BY USING THIS CODE
OR ANY PART OF IT, MAKE A REFERENCE TO THE FOLLOWING PAPER:
Delac K., Grgic M., Grgic S., Independent Comparative Study of PCA, ICA, and LDA
on the FERET Data Set, International Journal of Imaging Systems and Technology,
Vol. 15, Issue 5, 2006, pp. 252-260

=====================================================================================

pca.m

-------------------------------------------------------------------------------------

function pca (path, trainList, subDim)

if nargin < 3
subDim = dim - 1;
end;

disp(' ')

load listAll;

% Constants
numIm = 3816;


% Memory allocation for DATA matrix
fprintf('Creating DATA matrix\n')
tmp = imread ( [path char(listAll(1)) '.pgm'] );
[m, n] = size (tmp); % image size - used later also!!!
DATA = uint8 (zeros(m*n, numIm)); % Memory allocated
clear str tmp;

% Creating DATA matrix
for i = 1 : numIm
im = imread ( [path char(listAll(i)) '.pgm'] );
DATA(:, i) = reshape (im, m*n, 1);
end;
save DATA DATA;
clear im;

% Creating training images space
fprintf('Creating training images space\n')
dim = length (trainList);
imSpace = zeros (m*n, dim);
for i = 1 : dim
index = strmatch (trainList(i), listAll);
imSpace(:, i) = DATA(:, index);
end;
save imSpace imSpace;
clear DATA;

% Calculating mean face from training images
fprintf('Zero mean\n')
psi = mean(double(imSpace'))';
save psi psi;

% Zero mean
zeroMeanSpace = zeros(size(imSpace));
for i = 1 : dim
zeroMeanSpace(:, i) = double(imSpace(:, i)) - psi;
end;
save zeroMeanSpace zeroMeanSpace;
clear imSpace;

% PCA
fprintf('PCA\n')
L = zeroMeanSpace' * zeroMeanSpace; % Turk-Pentland trick (part 1)
[eigVecs, eigVals] = eig(L);

diagonal = diag(eigVals);
[diagonal, index] = sort(diagonal);
index = flipud(index);

pcaEigVals = zeros(size(eigVals));
for i = 1 : size(eigVals, 1)
pcaEigVals(i, i) = eigVals(index(i), index(i));
pcaEigVecs(:, i) = eigVecs(:, index(i));
end;

pcaEigVals = diag(pcaEigVals);
pcaEigVals = pcaEigVals / (dim-1);
pcaEigVals = pcaEigVals(1 : subDim); % Retaining only the largest subDim ones

pcaEigVecs = zeroMeanSpace * pcaEigVecs; % Turk-Pentland trick (part 2)

save pcaEigVals pcaEigVals;

% Normalisation to unit length
fprintf('Normalising\n')
for i = 1 : dim
pcaEigVecs(:, i) = pcaEigVecs(:, i) / norm(pcaEigVecs(:, i));
end;

% Dimensionality reduction.
fprintf('Creating lower dimensional subspace\n')
w = pcaEigVecs(:, 1:subDim);
save w w;
clear w;

% Subtract mean face from all images
load DATA;
load psi;
zeroMeanDATA = zeros(size(DATA));
for i = 1 : size(DATA, 2)
zeroMeanDATA(:, i) = double(DATA(:, i)) - psi;
end;
clear psi;
clear DATA;

% Project all images onto a new lower dimensional subspace (w)
fprintf('Projecting all images onto a new lower dimensional subspace\n')
load w;
pcaProj = w' * zeroMeanDATA;
clear w;
clear zeroMeanDATA;
save pcaProj pcaProj;

=====================================================================================

createDistMat.m

-------------------------------------------------------------------------------------

function distMat = createDistMat (proj, metric)

% Memory allocation
distMat = zeros(max(size(proj)));

switch (metric)

case 'L1'
distMat = pdist(proj', 'cityblock');
case 'L2'
distMat = pdist(proj', 'euclidean');
case 'COS'
distMat = pdist(proj', 'cosine');

otherwise
error('%s metric not supported.', metric);

end; % switch (metric) ends here

distMat = squareform(distMat);

=====================================================================================

feret.m

-------------------------------------------------------------------------------------

function [FERET] = feret(distMat, rank)

% Load feretGallery and probe lists
load listAll;
load feretGallery;
load fb;
load fc;
load dup1;
load dup2;

% Constants (number of images in feretGallery and probes)
numGalleryImgs = size (feretGallery, 1);
numFbImgs = size (fb, 1);
numFcImgs = size (fc, 1);
numDup1Imgs = size (dup1, 1);
numDup2Imgs = size (dup2, 1);

% Get the list of positions where feretGallery images are located in the list
% of all images and store it in variable index
feretGalleryIndex = getIndex (feretGallery, listAll);

% Get the list of all the probe images
fbIndex = getIndex (fb, listAll);
fcIndex = getIndex (fc, listAll);
dup1Index = getIndex (dup1, listAll);
dup2Index = getIndex (dup2, listAll);

% Calculate ranks for the CMS curve
% The results are stores in a structure
FERET.fb = getResults (distMat, feretGallery, feretGalleryIndex, fb, fbIndex, numFbImgs, rank);
FERET.fc = getResults (distMat, feretGallery, feretGalleryIndex, fc, fcIndex, numFcImgs, rank);
FERET.dup1 = getResults (distMat, feretGallery, feretGalleryIndex, dup1, dup1Index, numDup1Imgs, rank);
FERET.dup2 = getResults (distMat, feretGallery, feretGalleryIndex, dup2, dup2Index, numDup2Imgs, rank);

%**************************************************************************
% Statistics (CMS curve, total number of probe images)
%**************************************************************************

function [RESULTS] = getResults (distMat, feretGallery, feretGalleryIndex, probe, probeIndex, numProbeImgs, rank)

for j = 1 : rank
for i = 1 : numProbeImgs

position = probeIndex(i);
currentRow = distMat(position,:);
reduced = currentRow(1, feretGalleryIndex);

[Y, I] = sort(reduced);
inx = I(j);

% Determine if G=H based on the first 5 characters of the filename
G = char(feretGallery(inx));
H = char(probe(i));
correct(i) = strncmp(G, H, 5);
% if G=H correct(i)=1, else 0

end;

% Rank 1 result (percentage)
if j == 1
RESULTS.rank1 = (sum(correct)/numProbeImgs)*100;
end;

% Percentage of correctly recognized images for a given probe set
RESULTS.perc(j) = (sum(correct)/numProbeImgs)*100;

% CMS curve
RESULTS.cms(j) = sum(RESULTS.perc(1:j));

end;

% Total number of probe images
RESULTS.numProbeImgs = numProbeImgs;
%**************************************************************************

%**************************************************************************
% Find positions of probe or feretGallery images in the list of all images
%**************************************************************************

function [index] = getIndex (sub, all)

num = size (sub, 1);
for i = 1 : num
index(i) = strmatch(sub(i), all);
end;
%**************************************************************************

3 comments:

  1. Good information. And i alway like to read the quality content. And i am really happy to found this information on your blog. Thanks for sharing this opportunity to leave a comment.
    domain hosting india

    ReplyDelete
  2. I used this code but I have this error
    Undefined function or variable 'dim'.
    Error in pca (line 80)
    subDim = dim - 1;
    can you help me ?

    ReplyDelete