Home > release > pca.m

pca

PURPOSE ^

PCA Low-rank approximation in SVD form.

SYNOPSIS ^

function [U,S,V] = pca(A,k,its,l)

DESCRIPTION ^

PCA  Low-rank approximation in SVD form.


   [U,S,V] = PCA(A)  constructs a nearly optimal rank-6 approximation
             USV' to A, using 2 full iterations of a block Lanczos method
             of block size 6+2=8, started with an n x 8 random matrix,
             when A is m x n; the ref. below explains "nearly optimal."
             The smallest dimension of A must be >= 6 when A is
             the only input to PCA.

   [U,S,V] = PCA(A,k)  constructs a nearly optimal rank-k approximation
             USV' to A, using 2 full iterations of a block Lanczos method
             of block size k+2, started with an n x (k+2) random matrix,
             when A is m x n; the ref. below explains "nearly optimal."
             k must be a positive integer <= the smallest dimension of A.

   [U,S,V] = PCA(A,k,its)  constructs a nearly optimal rank-k approx. USV'
             to A, using its full iterations of a block Lanczos method
             of block size k+2, started with an n x (k+2) random matrix,
             when A is m x n; the ref. below explains "nearly optimal."
             k must be a positive integer <= the smallest dimension of A,
             and its must be a nonnegative integer.

   [U,S,V] = PCA(A,k,its,l)  constructs a nearly optimal rank-k approx.
             USV' to A, using its full iterates of a block Lanczos method
             of block size l, started with an n x l random matrix,
             when A is m x n; the ref. below explains "nearly optimal."
             k must be a positive integer <= the smallest dimension of A,
             its must be a nonnegative integer,
             and l must be a positive integer >= k.


   The low-rank approximation USV' is in the form of an SVD in the sense
   that the columns of U are orthonormal, as are the columns of V,
   the entries of S are all nonnegative, and the only nonzero entries
   of S appear in non-increasing order on its diagonal.
   U is m x k, V is n x k, and S is k x k, when A is m x n.

   Increasing its or l improves the accuracy of the approximation USV'
   to A; the ref. below describes how the accuracy depends on its and l.


   Note: PCA invokes RAND. To obtain repeatable results,
         invoke RAND('seed',j) with a fixed integer j before invoking PCA.

   Note: PCA currently requires the user to center and normalize the rows
         or columns of the input matrix A before invoking PCA (if such
         is desired).

   Note: The user may ascertain the accuracy of the approximation USV'
         to A by invoking DIFFSNORM(A,U,S,V).


   inputs (the first is required):
   A -- matrix being approximated
   k -- rank of the approximation being constructed;
        k must be a positive integer <= the smallest dimension of A,
        and defaults to 6
   its -- number of full iterations of a block Lanczos method to conduct;
          its must be a nonnegative integer, and defaults to 2
   l -- block size of the block Lanczos iterations;
        l must be a positive integer >= k, and defaults to k+2

   outputs (all three are required):
   U -- m x k matrix in the rank-k approximation USV' to A,
        where A is m x n; the columns of U are orthonormal
   S -- k x k matrix in the rank-k approximation USV' to A,
        where A is m x n; the entries of S are all nonnegative,
        and its only nonzero entries appear in nonincreasing order
        on the diagonal
   V -- n x k matrix in the rank-k approximation USV' to A,
        where A is m x n; the columns of V are orthonormal


   Example:
     A = rand(1000,2)*rand(2,1000);
     A = A/normest(A);
     [U,S,V] = pca(A,2,0);
     diffsnorm(A,U,S,V)

     This code snippet produces a rank-2 approximation USV' to A such that
     the columns of U are orthonormal, as are the columns of V, and
     the entries of S are all nonnegative and are zero off the diagonal.
     diffsnorm(A,U,S,V) outputs an estimate of the spectral norm
     of A-USV', which should be close to the machine precision.


   Reference:
   Nathan Halko, Per-Gunnar Martinsson, and Joel Tropp,
   Finding structure with randomness: Stochastic algorithms
   for constructing approximate matrix decompositions,
   arXiv:0909.4061 [math.NA; math.PR], 2009
   (available at http://arxiv.org).


   See also PCACOV, PRINCOMP, SVDS.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [U,S,V] = pca(A,k,its,l)
0002 %PCA  Low-rank approximation in SVD form.
0003 %
0004 %
0005 %   [U,S,V] = PCA(A)  constructs a nearly optimal rank-6 approximation
0006 %             USV' to A, using 2 full iterations of a block Lanczos method
0007 %             of block size 6+2=8, started with an n x 8 random matrix,
0008 %             when A is m x n; the ref. below explains "nearly optimal."
0009 %             The smallest dimension of A must be >= 6 when A is
0010 %             the only input to PCA.
0011 %
0012 %   [U,S,V] = PCA(A,k)  constructs a nearly optimal rank-k approximation
0013 %             USV' to A, using 2 full iterations of a block Lanczos method
0014 %             of block size k+2, started with an n x (k+2) random matrix,
0015 %             when A is m x n; the ref. below explains "nearly optimal."
0016 %             k must be a positive integer <= the smallest dimension of A.
0017 %
0018 %   [U,S,V] = PCA(A,k,its)  constructs a nearly optimal rank-k approx. USV'
0019 %             to A, using its full iterations of a block Lanczos method
0020 %             of block size k+2, started with an n x (k+2) random matrix,
0021 %             when A is m x n; the ref. below explains "nearly optimal."
0022 %             k must be a positive integer <= the smallest dimension of A,
0023 %             and its must be a nonnegative integer.
0024 %
0025 %   [U,S,V] = PCA(A,k,its,l)  constructs a nearly optimal rank-k approx.
0026 %             USV' to A, using its full iterates of a block Lanczos method
0027 %             of block size l, started with an n x l random matrix,
0028 %             when A is m x n; the ref. below explains "nearly optimal."
0029 %             k must be a positive integer <= the smallest dimension of A,
0030 %             its must be a nonnegative integer,
0031 %             and l must be a positive integer >= k.
0032 %
0033 %
0034 %   The low-rank approximation USV' is in the form of an SVD in the sense
0035 %   that the columns of U are orthonormal, as are the columns of V,
0036 %   the entries of S are all nonnegative, and the only nonzero entries
0037 %   of S appear in non-increasing order on its diagonal.
0038 %   U is m x k, V is n x k, and S is k x k, when A is m x n.
0039 %
0040 %   Increasing its or l improves the accuracy of the approximation USV'
0041 %   to A; the ref. below describes how the accuracy depends on its and l.
0042 %
0043 %
0044 %   Note: PCA invokes RAND. To obtain repeatable results,
0045 %         invoke RAND('seed',j) with a fixed integer j before invoking PCA.
0046 %
0047 %   Note: PCA currently requires the user to center and normalize the rows
0048 %         or columns of the input matrix A before invoking PCA (if such
0049 %         is desired).
0050 %
0051 %   Note: The user may ascertain the accuracy of the approximation USV'
0052 %         to A by invoking DIFFSNORM(A,U,S,V).
0053 %
0054 %
0055 %   inputs (the first is required):
0056 %   A -- matrix being approximated
0057 %   k -- rank of the approximation being constructed;
0058 %        k must be a positive integer <= the smallest dimension of A,
0059 %        and defaults to 6
0060 %   its -- number of full iterations of a block Lanczos method to conduct;
0061 %          its must be a nonnegative integer, and defaults to 2
0062 %   l -- block size of the block Lanczos iterations;
0063 %        l must be a positive integer >= k, and defaults to k+2
0064 %
0065 %   outputs (all three are required):
0066 %   U -- m x k matrix in the rank-k approximation USV' to A,
0067 %        where A is m x n; the columns of U are orthonormal
0068 %   S -- k x k matrix in the rank-k approximation USV' to A,
0069 %        where A is m x n; the entries of S are all nonnegative,
0070 %        and its only nonzero entries appear in nonincreasing order
0071 %        on the diagonal
0072 %   V -- n x k matrix in the rank-k approximation USV' to A,
0073 %        where A is m x n; the columns of V are orthonormal
0074 %
0075 %
0076 %   Example:
0077 %     A = rand(1000,2)*rand(2,1000);
0078 %     A = A/normest(A);
0079 %     [U,S,V] = pca(A,2,0);
0080 %     diffsnorm(A,U,S,V)
0081 %
0082 %     This code snippet produces a rank-2 approximation USV' to A such that
0083 %     the columns of U are orthonormal, as are the columns of V, and
0084 %     the entries of S are all nonnegative and are zero off the diagonal.
0085 %     diffsnorm(A,U,S,V) outputs an estimate of the spectral norm
0086 %     of A-USV', which should be close to the machine precision.
0087 %
0088 %
0089 %   Reference:
0090 %   Nathan Halko, Per-Gunnar Martinsson, and Joel Tropp,
0091 %   Finding structure with randomness: Stochastic algorithms
0092 %   for constructing approximate matrix decompositions,
0093 %   arXiv:0909.4061 [math.NA; math.PR], 2009
0094 %   (available at http://arxiv.org).
0095 %
0096 %
0097 %   See also PCACOV, PRINCOMP, SVDS.
0098 %
0099 
0100 %   Copyright 2009 Mark Tygert.
0101 
0102 %
0103 % Check the number of inputs.
0104 %
0105 if(nargin < 1)
0106   error('MATLAB:pca:TooFewIn',...
0107         'There must be at least 1 input.')
0108 end
0109 
0110 if(nargin > 4)
0111   error('MATLAB:pca:TooManyIn',...
0112         'There must be at most 4 inputs.')
0113 end
0114 
0115 %
0116 % Check the number of outputs.
0117 %
0118 if(nargout ~= 3)
0119   error('MATLAB:pca:WrongNumOut',...
0120         'There must be exactly 3 outputs.')
0121 end
0122 
0123 %
0124 % Set the inputs k, its, and l to default values, if necessary.
0125 %
0126 if(nargin == 1)
0127   k = 6;
0128   its = 2;
0129   l = k+2;
0130 end
0131 
0132 if(nargin == 2)
0133   its = 2;
0134   l = k+2;
0135 end
0136 
0137 if(nargin == 3)
0138   l = k+2;
0139 end
0140 
0141 %
0142 % Check the first input argument.
0143 %
0144 if(~isfloat(A))
0145   error('MATLAB:pca:In1NotFloat',...
0146         'Input 1 must be a floating-point matrix.')
0147 end
0148 
0149 if(isempty(A))
0150   error('MATLAB:pca:In1Empty',...
0151         'Input 1 must not be empty.')
0152 end
0153 
0154 %
0155 % Retrieve the dimensions of A.
0156 %
0157 [m n] = size(A);
0158 
0159 %
0160 % Check the remaining input arguments.
0161 %
0162 if(size(k,1) ~= 1 || size(k,2) ~= 1)
0163   error('MATLAB:pca:In2Not1x1',...
0164         'Input 2 must be a scalar.')
0165 end
0166 
0167 if(size(its,1) ~= 1 || size(its,2) ~= 1)
0168   error('MATLAB:pca:In3Not1x1',...
0169         'Input 3 must be a scalar.')
0170 end
0171 
0172 if(size(l,1) ~= 1 || size(l,2) ~= 1)
0173   error('MATLAB:pca:In4Not1x1',...
0174         'Input 4 must be a scalar.')
0175 end
0176 
0177 if(k <= 0)
0178   error('MATLAB:pca:In2NonPos',...
0179         'Input 2 must be > 0.')
0180 end
0181 
0182 if((k > m) || (k > n))
0183   error('MATLAB:pca:In2TooBig',...
0184         'Input 2 must be <= the smallest dimension of Input 1.')
0185 end
0186 
0187 if(its < 0)
0188   error('MATLAB:pca:In3Neg',...
0189         'Input 3 must be >= 0.')
0190 end
0191 
0192 if(l < k)
0193   error('MATLAB:pca:In4ltIn2',...
0194         'Input 4 must be >= Input 2.')
0195 end
0196 
0197 %
0198 % SVD A directly if (its+1)*l >= m/1.25 or (its+1)*l >= n/1.25.
0199 %
0200 if(((its+1)*l >= m/1.25) || ((its+1)*l >= n/1.25))
0201 
0202   if(~issparse(A))
0203     [U,S,V] = svd(A,'econ');
0204   end
0205 
0206   if(issparse(A))
0207     [U,S,V] = svd(full(A),'econ');
0208   end
0209 %
0210 % Retain only the leftmost k columns of U, the leftmost k columns of V,
0211 % and the uppermost leftmost k x k block of S.
0212 %
0213   U = U(:,1:k);
0214   V = V(:,1:k);
0215   S = S(1:k,1:k);
0216 
0217   return
0218 
0219 end
0220 
0221 
0222 if(m >= n)
0223 
0224 %
0225 % Apply A to a random matrix, obtaining H.
0226 %
0227   rand('seed',rand('seed'));
0228 
0229   if(isreal(A))
0230     H = A*(2*rand(n,l)-ones(n,l));
0231   end
0232 
0233   if(~isreal(A))
0234     H = A*( (2*rand(n,l)-ones(n,l)) + i*(2*rand(n,l)-ones(n,l)) );
0235   end
0236 
0237   rand('twister',rand('twister'));
0238 
0239 %
0240 % Initialize F to its final size and fill its leftmost block with H.
0241 %
0242   F = zeros(m,(its+1)*l);
0243   F(1:m, 1:l) = H;
0244 
0245 %
0246 % Apply A*A' to H a total of its times,
0247 % augmenting F with the new H each time.
0248 %
0249   for it = 1:its
0250     H = (H'*A)';
0251     H = A*H;
0252     F(1:m, (1+it*l):((it+1)*l)) = H;
0253   end
0254 
0255   clear H;
0256 
0257 %
0258 % Form a matrix Q whose columns constitute an orthonormal basis
0259 % for the columns of F.
0260 %
0261   [Q,R,E] = qr(F,0);
0262 
0263   clear F R E;
0264 
0265 %
0266 % SVD Q'*A to obtain approximations to the singular values
0267 % and right singular vectors of A; adjust the left singular vectors
0268 % of Q'*A to approximate the left singular vectors of A.
0269 %
0270   [U2,S,V] = svd(Q'*A,'econ');
0271   U = Q*U2;
0272 
0273   clear Q U2;
0274 
0275 %
0276 % Retain only the leftmost k columns of U, the leftmost k columns of V,
0277 % and the uppermost leftmost k x k block of S.
0278 %
0279   U = U(:,1:k);
0280   V = V(:,1:k);
0281   S = S(1:k,1:k);
0282 
0283 end
0284 
0285 
0286 if(m < n)
0287 
0288 %
0289 % Apply A' to a random matrix, obtaining H.
0290 %
0291   rand('seed',rand('seed'));
0292 
0293   if(isreal(A))
0294     H = ((2*rand(l,m)-ones(l,m))*A)';
0295   end
0296 
0297   if(~isreal(A))
0298     H = (( (2*rand(l,m)-ones(l,m)) + i*(2*rand(l,m)-ones(l,m)) )*A)';
0299   end
0300 
0301   rand('twister',rand('twister'));
0302 
0303 %
0304 % Initialize F to its final size and fill its leftmost block with H.
0305 %
0306   F = zeros(n,(its+1)*l);
0307   F(1:n, 1:l) = H;
0308 
0309 %
0310 % Apply A'*A to H a total of its times,
0311 % augmenting F with the new H each time.
0312 %
0313   for it = 1:its
0314     H = A*H;
0315     H = (H'*A)';
0316     F(1:n, (1+it*l):((it+1)*l)) = H;
0317   end
0318 
0319   clear H;
0320 
0321 %
0322 % Form a matrix Q whose columns constitute an orthonormal basis
0323 % for the columns of F.
0324 %
0325   [Q,R,E] = qr(F,0);
0326 
0327   clear F R E;
0328 
0329 %
0330 % SVD A*Q to obtain approximations to the singular values
0331 % and left singular vectors of A; adjust the right singular vectors
0332 % of A*Q to approximate the right singular vectors of A.
0333 %
0334   [U,S,V2] = svd(A*Q,'econ');
0335   V = Q*V2;
0336 
0337   clear Q V2;
0338 
0339 %
0340 % Retain only the leftmost k columns of U, the leftmost k columns of V,
0341 % and the uppermost leftmost k x k block of S.
0342 %
0343   U = U(:,1:k);
0344   V = V(:,1:k);
0345   S = S(1:k,1:k);
0346 
0347 end

Generated on Wed 22-Dec-2010 16:09:20 by m2html © 2003