In this lab, you will see some applications of the SVD, including
Numerical methods for finding the singular value decomposition will also be addressed in this lab. One ``obvious'' algorithm involves finding the eigenvalues of , but this is not really practical because of roundoff difficulties caused by squaring the condition number of . Matlab includes a function called svd with signature [U S V]=svd(A) to compute the singular value decomposition and we will be using it, too. This function uses the Lapack subroutine dgesvd, so if you were to need it in a Fortran or C program, it would be available by linking against the Lapack library.
You may be interested in a six-minute video made by Cleve Moler in 1976
at what was then known as the Los Alamos Scientific Laboratory. Most
of the film is computer animated graphics. Moler recently retrieved the
film from storage, digitized it, and uploaded it to YouTube at
http://youtu.be/R9UoFyqJca8. You may also be interested in
a description of the making of the film, and its use as background
graphics in the first Star Trek movie, is in his blog at
http://blogs.mathworks.com/cleve/2012/12/10/1976-matrix-singular-value-decomposition-film/
This lab will take three sessions.
The matrix in (1) is and is of the form
and . Since and are unitary (and hence nonsingular), it is easy to see that the number is the rank of the matrix and is necessarily no larger than . The ``singular values,'' , are real and positive and are the eigenvalues of the Hermitian matrix .
Supposing that is the largest integer so that , then the SVD implies that the rank of the matrix is . Similarly, if the matrix is , then the rank of the nullspace of is . The first columns of are an orthonormal basis of the range of , and the last columns of are an orthonormal basis for the nullspace of . Of course, in the presence of roundoff some judgement must be made as to what constitutes a zero singular value, but the SVD is the best (if expensive) way to discover the rank and nullity of a matrix. This is especially true when there is a large gap between the ``smallest of the large singular values'' and the ``largest of the small singular values.''
The SVD can also be used to solve a matrix system. Assuming that the matrix is non-singular, all singular values are strictly positive, and the SVD can be used to solve a system.
Further, if is close to singular, a similar definition but with diagonal entries for for some can work very nicely. In these latter cases, the ``solution'' is the least-squares best fit solution and the matrix is called the Moore-Penrose pseudo-inverse of .
Imagine that you have been given many ``samples'' of related data involving several variables and you wish to find a linear relationship among the variables that approximates the given data in the best-fit sense. This can be written as a rectangular system
The system (3) can be written in matrix form. Recall that the values are the unknowns. For a moment, denote the vector , the matrix and the vector , all for and , where . With this notation, (3) can be written as the matrix equation , where is (usually) a rectangular matrix. A ``least-squares'' solution of is a vector satisfying
To see why this is so, suppose that the rank of is , so that there are precisely positive singular values of . Then and
where | ||
As we have done several times before, we will first generate a data set with known solution and then use svd to recover the known solution. Place Matlab code for the following steps into a script m-file called exer1.m
N=20; d1=rand(N,1); d2=rand(N,1); d3=rand(N,1); d4=4*d1-3*d2+2*d3-1;It should be obvious that these vectors satisfy the equation
but, if you were solving a real problem, you would not be aware of this equation, and it would be your objective to discover it, i.e., to find the coefficients in the equation.
EPSILON=1.e-5; d1=d1.*(1+EPSILON*rand(N,1)); d2=d2.*(1+EPSILON*rand(N,1)); d3=d3.*(1+EPSILON*rand(N,1)); d4=d4.*(1+EPSILON*rand(N,1));You have now constructed the ``data'' for the following least-squares calculation.
[U S V]=svd(A);Please include the four non-zero values of S in your summary, but not the matrices U and V.
Find this vector by setting b=ones(N,1) (the coefficients in Equation (3) have been moved to the right and are ) using Equation (2) above. Include your solution with your summary. It should be close to the known solution x=[4;-3;2;-1].
Sometimes the data you are given turns out to be deficient because the variables supposed to be independent are actually related. This dependency will cause the coefficient matrix to be singular or nearly singular. When the matrix is singular, the system of equations is actually redundant, and one equation can be eliminated. This yields fewer equations than unknowns and any member of an affine subspace can be legitimately regarded as ``the'' solution.
Dealing with dependencies of this sort is one of the greatest difficulties in using least-squares fitting methods because it is hard to know which of the equations is the best one to elminate. The singular value decomposition is the best way to deal with dependencies. In the following exercise you will construct a deficient set of data and see how to use the singular value decomposition to find the solution.
d3=rand(N,1);with the line
d3=d1+d2;This makes the data deficient and introduces nonuniqueness into the solution for the coefficients.
and find the coefficient vector x as in (2). The solution you found is probably not x=[4;-3;2;-1].
In the following exercise you will see how the singular value decomposition can be used to ``compress'' a graphical figure by representing the figure as a matrix and then using the singular value decomposition to find the closest matrix of lower rank to the original. This approach can form the basis of efficient compression methods.
nasacolor=imread('TarantulaNebula.jpg');The variable nasacolor will be a matrix of integer values between 0 and 255.
image(nasacolor)
nasa=sum(nasacolor,3,'double'); %sum up red+green+blue m=max(max(nasa)); %find the max value nasa=nasa*255/m; %make this be bright whiteThe result from these commands is that nasa is an ordinary array of double precision numbers.
colormap(gray(256));to set a grayscale colormap. Now you can show the picture with the command
image(nasa) title('Grayscale NASA photo');Please do not send me this plot.
nasa100=U(:,1:100)*S(1:100,1:100)*V(:,1:100)'; nasa50=U(:,1:50)*S(1:50,1:50)*V(:,1:50)'; nasa25=U(:,1:25)*S(1:25,1:25)*V(:,1:25)';These matrices are of lower rank than the nasa matrix, and can be stored in a more efficient manner. (The nasa matrix is and requires 357,210 numbers to store it. In contrast, nasa50 requires subsets of and that are and the diagonal of , for a total of 56,750. This is better than four to one savings in space.) Do not include these matrices with your summary!
We now turn to the question of how to compute the SVD.
The easiest algorithm for SVD is to note the relationship between it and the eigenvalue decomposition: singular values are the square roots of the eigenvalues of or . Indeed, if , then
and | ||
Unfortunately, this is not a good algorithm because forming the product roughly squares the condition number, so that the eigenvalue solution is not likely to be accurate. Of course, it will work fine for small matrices with small condition numbers and you can find this algorithm presented in many web pages. Don't believe everything you see on the internet.
A more practical algorithm is a Jacobi algorithm that is given in a 1989 report by James Demmel and Krešimir Veselic, that can be found at http://www.netlib.org/lapack/lawnspdf/lawn15.pdf. The algorithm is a one-sided Jacobi iterative algorithm that appears as Algorithm 4.1, p32, of that report. This algorithm amounts to the Jacobi algorithm for finding eigenvalues of a symmetric matrix. (See, for example, J. H. Wilkinson, The Algebraic Eigenvalue Problem, Clarendon Press, Oxford, 1965, or G. H. Golub, C. F. Van Loan, Matrix Computations Johns Hopkins University Press, 3rd Ed, 1996.)
The algorithm implicitly computes the product
and then
uses a sequence of ``Jacobi rotations'' to diagonalize it. A Jacobi
rotation is a
matrix rotation that annihilates the off-diagonal
term of a symmetric
matrix. You can find more in a nice
Wikipedia article located at
http://en.wikipedia.org/wiki/Jacobi_rotation.
Given a matrix with
It is possible to find a ``rotation by angle matrix''
with the property that is a diagonal matrix. can be found by multiplying out the matrix product, expressing the off-diagonal matrix component in terms of , and setting it to zero to arrive at an equation
where . The quadratic formula gives , and then and can be recovered. There is no need to recover itself.
The following algorithm repeatedly passes down the diagonal of the
implicitly-constructed matrix
, choosing pairs of indices
,
constructing the
submatrix from the intersection of rows
and columns, and using Jacobi rotations to annhilate the off-diagonal
term. Unfortunately, the Jacobi rotation for the pair
,
messes up the rotation from
,
, so the process must be
repeated until convergence. At convergence, the matrix
of
the SVD has been implicitly generated, and the right and left singular
vectors are recovered by multiplying all the Jacobi rotations together.
The following algorithm carries out this process.
Note: For the purpose of this lab, that matrix
will be assumed to be square. Rectangular matrices introduce
too much algorithmic complexity.
Given a convergence criterion , a square matrix , a matrix , that starts out as and ends up as a matrix whose columns are left singular vectors, and another matrix that starts out as the identity matrix and ends up as a matrix whose columns are right singular vectors.
function [U,S,V]=jacobi_svd(A) % [U S V]=jacobi_svd(A) % A is original square matrix % Singular values come back in S (diag matrix) % orig matrix = U*S*V' % % One-sided Jacobi algorithm for SVD % lawn15: Demmel, Veselic, 1989, % Algorithm 4.1, p. 32 TOL=1.e-8; MAX_STEPS=40; n=size(A,1); U=A; V=eye(n); for steps=1:MAX_STEPS converge=0; for j=2:n for k=1:j-1 % compute [alpha gamma;gamma beta]=(k,j) submatrix of U'*U alpha=??? %might be more than 1 line beta=??? %might be more than 1 line gamma=??? %might be more than 1 line converge=max(converge,abs(gamma)/sqrt(alpha*beta)); % compute Jacobi rotation that diagonalizes % [alpha gamma;gamma beta] if gamma ~= 0 zeta=(beta-alpha)/(2*gamma); t=sign(zeta)/(abs(zeta)+sqrt(1+zeta^2)); else % if gamma=0, then zeta=infinity and t=0 t=0; end c=??? s=??? % update columns k and j of U T=U(:,k); U(:,k)=c*T-s*U(:,j); U(:,j)=s*T+c*U(:,j); % update matrix V of right singular vectors ??? end end if converge < TOL break; end end if steps >= MAX_STEPS error('jacobi_svd failed to converge!'); end % the singular values are the norms of the columns of U % the left singular vectors are the normalized columns of U for j=1:n singvals(j)=norm(U(:,j)); U(:,j)=U(:,j)/singvals(j); end S=diag(singvals);
U=[0.6 0.8 0.8 -0.6]; V=sqrt(2)/2*[1 1 1 -1]; S=diag([5 4]);It is easy to see that U and V are orthogonal matrices, so that the matrices U, S and V comprise the SVD of A. You may get a ``division by zero'' error, but this is harmless because it comes from gamma being zero, which will cause zeta to be infinite and t to be zero anyhow.
Your algorithm should essentially reproduce the matrices U, S and V. You might find that the diagonal entries of S are not in order, so that the U and V are similarly permuted, or you might observe that certain columns of U or V have been multiplied by (-1). Be sure, however, that an even number of factors of (-1) have been introduced.
If you do not get the right answers, you can debug your code in the following way. First, note that there is only a single term, k=1 and j=2 in the double for loop.
A1 = [ 1 3 2 5 6 4 7 8 9];Check that U1 and V1 are orthogonal matrices and that A1=U1*S1*V1' up to roundoff. In addition, print U1, S1 and V1 using format long and visually compare them with the singular values you get from the Matlab svd function. You should find they agree almost to roundoff, despite the coarse tolerance inside jacobi_svd. You also will note that the values do not come out sorted, as they do from svd. It is a simple matter to sort them, but then you would have to permute the columns of U1 and V1 to match. Again, some columns of U1 and V1 might be multiplied by (-1) and, if so, there must be an even number of those sign changes. Please include U1, S1 and V1 in your summary.
You now have an implementation of one SVD algorithm. This algorithm is a good one, with some realistic applications, and is one of the two algorithms available for the SVD in the GNU scientific library. (See http://www.gnu.org/software/gsl/ for more detail.) In the following sections, you will see a different algorithm.
The most commonly used SVD algorithm is found in Matlab and in the Lapack linear algebra library. (See http://www.netlib.org/lapack/.) It is a revised version of one that appeared in Golub and Van Loan. The revised version is presented by J. Demmel, W. Kahan, ``Accurate Singular Values of Bidiagonal Matrices,'' SIAM J. Sci. Stat. Comput., 11(1990) pp. 873-912. This paper can also be found at http://www.netlib.org/lapack/lawnspdf/lawn03.pdf.
The standard algorithm is a two-step algorithm. It first reduces the matrix to bidiagonal form and then finds the SVD of the bidiagonal matrix. Reduction to bidiagonal form is accomplished using Householder transformations, a topic you have already seen. Finding the SVD of a bidiagonal matrix is an iterative process that must be carefully performed in order to minimize both numerical errors and the number of iterations required. To accomplish these tasks, the algorithm chooses whether Golub and Van Loan's original algorithm is better than Demmel and Kahan's, or vice-versa. Further, other choices are made to speed up each iteration. There is not time to discuss all these details, so we will only consider a simplified version of Demmel and Kahan's zero-shift algorithm.
In the Jacobi algorithm in the previous section, you saw how the two matrices and can be constructed by multiplying the various rotation matrices as the iterations progress. This procedure is the same for the standard algorithm, so, in the interest of simplicity, most of the rest of this lab will be concerned only with the singular values themselves.
The first step in the algorithm is to reduce the matrix to bidiagonal form. You have already seen how to use Householder matrices to reduce a matrix to upper-triangular form. The idea, you recall, is to start with matrices and equal to the identity matrix and , so that . Then,
function [U,B,V]=bidiag_reduction(A) % [U B V]=bidiag_reduction(A) % Algorithm 6.5-1 in Golub & Van Loan, Matrix Computations % Johns Hopkins University Press % Finds an upper bidiagonal matrix B so that A=U*B*V' % with U,V orthogonal. A is an m x n matrix [m,n]=size(A); B=A; U=eye(m); V=eye(n); for k=1:n-1 % eliminate non-zeros below the diagonal % Keep the product U*B unchanged H=householder(B(:,k),k); B=H*B; U=U*H'; % eliminate non-zeros to the right of the % superdiagonal by working with the transpose % Keep the product B*V' unchanged ??? end
A=rand(100,100);Be sure that the condition that A=U*B*V' is satisfied, that the matrix B is bidiagonal, and that U and V are orthogonal matrices. Describe how you checked these facts and include the results of your tests in the summary. Hint: You can use the Matlab diag function to extract any diagonal from a matrix or to reconstruct a matrix from a diagonal. Use ``help diag'' to find out how. Alternatively, you can use the tril and triu functions.
An important piece of Demmel and Kahan's algorithm is a very efficient way of generating a ``Givens'' rotation matrix that annihilates the second component of a vector. The algorithm is presented on page 13 of their paper and is called ``rot.'' It is important to have an efficient algorithm because this step is the central step in computing the SVD. A 10% (for example) reduction in time for rot translates into a 10% reduction in time for the SVD.
This algorithm computes the cosine, , and sine, , of a rotation angle that satisfies the following condition.
Remark: The two different expressions for , and are mathematically equivalent. The choice of which to do is based on minimizing roundoff errors.
function [c,s,r]=rot(f,g) % [c s r]=rot(f,g) % more comments % your name and the date
[c s;-s c]*[f;g]and check that the resulting vector is [r;0].
[c s;-s c]*[f;g]and check that the resulting vector is [r;0].
[c s;-s c]*[f;g]and check that the resulting vector is [r;0].
The standard algorithm is based on repeated QR-type ``sweeps'' through the bidiagonal matrix. Suppose is a bidiagonal matrix, so that it looks like the following.
In its simplest form, a sweep begins at the top left of the matrix and runs down the rows to the bottom. For row , the sweep first annihilates the entry by multiplying on the right by a rotation matrix. This action introduces a non-zero value immediately below the diagonal. This new non-zero is then annihilated by multiplying on the left by a rotation matrix but this action introduces a non-zero value , outside the upper diagonal, but the next row down. Conveniently, this new non-zero is annihilated by the same rotation matrix that annihilates . (The proof uses the fact that was zero as a consequence of the first rotation, see the paper.) The algorithm continues until the final step on the bottom row that introduces no non-zeros. This process has been termed ``chasing the bulge'' and will be illustrated in the following exercise.
The following Matlab code implements the above algorithm. This code will be used in the following exercise to illustrate the algorithm.
function B=msweep(B) % B=msweep(B) % Demmel & Kahan zero-shift QR downward sweep % B starts as a bidiagonal matrix and is returned as % a bidiagonal matrix n=size(B,1); for k=1:n-1 [c s r]=rot(B(k,k),B(k,k+1)); % Construct matrix Q and multiply on the right by Q'. % Q annihilates both B(k-1,k+1) and B(k,k+1) % but makes B(k+1,k) non-zero. Q=eye(n); Q(k:k+1,k:k+1)=[c s;-s c]; B=B*Q'; [c s r]=rot(B(k,k),B(k+1,k)); % Construct matrix Q and multiply on the left by Q. % Q annihilates B(k+1,k) but makes B(k,k+1) and % B(k,k+2) non-zero. Q=eye(n); Q(k:k+1,k:k+1)=[c s;-s c]; B=Q*B; end
In this algorithm, there are two orthogonal (rotation) matrices, , employed. To see what their action is, consider the piece of consisting of rows and columns , , , and . Multiplying on the right by the transpose of the first rotation matrix has the following consequence.
where asterisks indicate (possible) non-zeros and Greek letters indicate values that do not change. The fact that two entries are annihilated in the third column is not obvious and is proved in the paper.
The second matrix multiplies on the left and has the following consequence.
The important consequence of these two rotations is that the row with three non-zeros in it (the ``bulge'') has moved from row to row , (in this example, from row 1 to row 2) while all other rows still have two non-zeros. This action is illustrated graphically in the following exercise.
% set almost-zero entries to true zero % display matrix and wait for a key B(find(abs(B)<1.e-13))=0; spy(B) disp('Plot completed. Strike a key to continue.') pauseInsert two copies of this code into msweep.m, one after each of the statements that start off ``B=.''
B=[1 11 0 0 0 0 0 0 0 0 0 2 12 0 0 0 0 0 0 0 0 0 3 13 0 0 0 0 0 0 0 0 0 4 14 0 0 0 0 0 0 0 0 0 5 15 0 0 0 0 0 0 0 0 0 6 16 0 0 0 0 0 0 0 0 0 7 17 0 0 0 0 0 0 0 0 0 8 18 0 0 0 0 0 0 0 0 0 9 19 0 0 0 0 0 0 0 0 0 10];The sequence of plots shows the ``bulge'' (extra non-zero entry) migrates from the top of the matrix down the rows until it disappears off the end. Please include any one of the plots with your work. If B=msweep(B); what is B(10,10)?
Demmel and Kahan present a streamlined form of this algorithm in their paper. This algorithm is not only faster but is more accurate because there are no subtractions to introduce cancellation and roundoff inaccuracy. Their algorithm is presented below.
The largest difference in speed comes from the representation of the bidiagonal matrix as a pair of vectors, and , the diagonal and superdiagonal entries, respectively. In addition, the intermediate product, , is not explicitly formed.
This algorithm begins and ends with the two vectors and , representing the diagonal and superdiagonal of a bidiagonal matrix. The vector has length .
function [d,e]=vsweep(d,e) % [d e]=vsweep(d,e) % comments % your name and the date
B=[1 11 0 0 0 0 0 0 0 0 0 2 12 0 0 0 0 0 0 0 0 0 3 13 0 0 0 0 0 0 0 0 0 4 14 0 0 0 0 0 0 0 0 0 5 15 0 0 0 0 0 0 0 0 0 6 16 0 0 0 0 0 0 0 0 0 7 17 0 0 0 0 0 0 0 0 0 8 18 0 0 0 0 0 0 0 0 0 9 19 0 0 0 0 0 0 0 0 0 10]; d=diag(B); e=diag(B,1);
n=200; d=2*rand(n,1)-1; %random numbers between -1 and 1 e=2*rand(n-1,1)-1; B=diag(d)+diag(e,1); tic;B=msweep(B);mtime=toc tic;[d e]=vsweep(d,e);vtime=toc norm(d-diag(B)) norm(e-diag(B,1))
It turns out that repeated sweeps tend to reduce the size of the superdiagonal entries. You can find a discussion of convergence in Demmel and Kahan, and the references therein, but the basic idea is to watch the values of and when they become small, set them to zero. Demmel and Kahan prove that this does not perturb the singular values much. In the following exercise, you will see how the superdiagonal size decreases.
d=[1;2;3;4;5;6;7;8;9;10]; e=[11;12;13;14;15;16;17;18;19];Use the following code to plot the absolute values for each of 15 repetitions of the algorithm.
for k=1:15 [d,e]=vsweep(d,e); plot(abs(e),'*-') hold on disp('Strike a key to continue.') pause end hold offstarting from the values of e and d above. You should observe that some entries converge quite rapidly to zero, and they all eventually decrease. Please include this plot with your summary.
d=[1;2;3;4;5;6;7;8;9;10]; e=[11;12;13;14;15;16;17;18;19];Please include this plot with your summary.
You should have observed that one entry converges very rapidly to zero, and that overall convergence to zero is asymptotically linear. It is often one of the two ends that converges most rapidly. Further, a judiciously chosen shift can make convergence even more rapid. We will not investigate these issues further here, but you can be sure that the software appearing, for example, in Lapack (used by Matlab) for the SVD takes advantage of these and other acceleration methods.
In the following exercise, the superdiagonal is examined during the iteration. As the values become small, they are set to zero, and the iteration continues with a smaller matrix. When all the superdiagonals are zero, the iteration is complete.
function [d,iterations]=bd_svd(d,e) % [d,iterations]=bd_svd(d,e) % more comments % your name and the date TOL=100*eps; n=length(d); maxit=500*n^2; % The following convergence criterion is discussed by % Demmel and Kahan. lambda(n)=abs(d(n)); for j=n-1:-1:1 lambda(j)=abs(d(j))*lambda(j+1)/(lambda(j+1)+abs(e(j))); end mu(1)=abs(d(1)); for j=1:n-1 mu(j+1)=abs(d(j+1))*mu(j)/(mu(j)+abs(e(j))); end sigmaLower=min(min(lambda),min(mu)); thresh=max(TOL*sigmaLower,maxit*realmin); iUpper=n-1; iLower=1; for iterations=1:maxit % reduce problem size when some zeros are % on the superdiagonal % Don't iterate further if value e values are small % how many small values are near the bottom right? j=iUpper; for i=iUpper:-1:1 if abs(e(i))>thresh j=i; break; end end iUpper=j; % how many small values are near the top left? j=iUpper; for i=iLower:iUpper if abs(e(i))>thresh j=i; break; end end iLower=j; if (iUpper==iLower & abs(e(iUpper))<=thresh) | ... (iUpper<iLower) % all done, sort singular values d=sort(abs(d),1,'descend'); %change to descending sort return end % do a sweep [d(iLower:iUpper+1),e(iLower:iUpper)]= ... vsweep(d(iLower:iUpper+1),e(iLower:iUpper)); end error('bd_svd: too many iterations!')
d=[1;2;3;4;5;6;7;8;9;10]; e=[11;12;13;14;15;16;17;18;19]; B=diag(d)+diag(e,1);What are the singular values that bd_svd produces? How many iterations did bd_svd need? What is the largest of the differences between the singular values bd_svd found and those that the Matlab function svd found for the matrix B?
d=rand(30,1); e=rand(29,1); B=diag(d)+diag(e,1);Use bd_svd to compute its singular values. How many iterations does it take? What is the largest of the differences between the singular values bd_svd found and those that the Matlab function svd found for the matrix B?
Remark: In the above algorithm, you can see that the singular values are forced to be positive by taking absolute values. This is legal because if a negative singular value arises then multiplying both it and the corresponding column of by negative one does not change the unitary nature of and leaves the singular value positive.
You saw in the previous exercise that the number of iterations can get large. It turns out that the number of iterations is dramatically reduced by the proper choice of shift, and tricks such as sometimes choosing to run sweeps up from the bottom right to the top left instead of always down as we did, depending on the matrix elements. Nonetheless, the presentation here should give you the flavor of the algorithm used.
In the following exercise, you will put your two functions, bidiag_reduction and bd_svd together into a function mysvd to find the singular values of a full matrix.
function [d,iterations]=mysvd(A) % [d,iterations]=mysvd(A) % more comments % your name and the dateThis function should:
``Latent semantic indexing'' is a very interesting idea that uses
SVD to help index natural
language documents. One explanation that is more complete
than the one here can be found in the Wikipedia:
http://en.wikipedia.org/wiki/Latent_semantic_analysis.
Suppose you are given a number of documents and a number (not the same number) of words that appear in some or all of the documents. First, create an ``incidence matrix'' that has a 1 in position if the th document contains at least one occurrance of the th word, and 0 otherwise.
It is important to realize that if you are given words and construct a vector with a 1 in positions corresponding to the given words, then multiplying the incidence matrix by this vector will yield a vector with entries and with the indicating how many of the given words appear in the th document.
It is natural, then to compute the SVD of the incidence matrix and keep only the singular vectors with large singular values, just as was done earlier in compressing the Mars picture. The SVD comes with a bonus: the right singular vectors (the columns of ) provide groupings of words that are related according to their use in the documents, and can be regarded as ``concepts'' appearing in the documents. The left singular vectors (the columns of ) provide groupings of the documents according to the words. The most surprising of the features of the SVD is that multiplication of the incidence matrix by a vector representing a group of words will identify precisely those documents containing the words but a similar multiplication by the matrix constructed of only a few singular values will identify documents containing related words!
In the following exercise, you will see an example of this process from which you can draw your own conclusions.
To generate the exercise, I took a list of titles of publications from the Math Department web site a few years ago and extracted a list of words and constructed an incidence matrix from the titles (regarded as ``documents'') and the words. You will see how the SVD is able to identify ``concepts'' consisting of groups of words related because of the way they are used in the titles. You will also see how one could use the SVD to identify documents that employ certain words.
I must mention a caveat. This example is flawed by its use of titles and authors from departmental research reports. In the first place, the number of titles is not large, and in the second place, the titles themselves are too short to clearly elicit relationships among words. Titles and authors are used because it was convenient to gather the data and because you students might be familiar with the authors and topics. A better choice would have been the abstracts of the articles instead of their titles, but these were not conveniently available.
Results in the following exercise can be better understood if you are aware of various specialized groups of Mathematics faculty. For example, the Mathematics Department web pages describe the Mathematical Biology group as consisting of Professors Doiron, Ermentrout, Rubin, Swigon, and Troy, and the Numerical Analysis and Scientific Computing group as consisting of Professors Layton, Neilan, Trenchea, and Yotov. Prof. Neilan joined the faculty after the list of publications used in this exercise was constructed.
Before continuing, you need to understand how a new Matlab data structure is used. It is called a ``cell array'' and you can think of it as an array of cells in a spreadsheet. The contents of the cells can be just about anything and in this case will be strings of varying lengths. An ordinary Matlab array of strings would have to have all strings of the same length-not a convenient way to store titles of papers.
A cell array is indicated by ``braces'' or ``curly brackets'': ``{'' and ``}''. An ordinary column vector of numbers might be defined
v=[1;2;3;4;5];while a cell array containing strings would be defined using curly brackets as
c={'one';'second string';'third';'4';'the fifth string is long.'}There are two ways to refer to the contents of a cell array.
Cell arrays can contain any Matlab data in their cells. A cell array might have a string in its first position, a matrix in its second position, a vector in its third position, and another cell array in its fourth position. This flexibility provides great power to the script writer, but we will only be using cell arrays that contain strings of varying lengths, and the reason we are using cell arrays and not string arrays is because a cell array can contain strings of varying lengths while a string array can contain only strings of the same lengths.
For the following exercise, I have constructed three cell arrays and a matrix, and they are included in the file reports_data.m.
c={'one';'second string';'third';'4';'the fifth string is long.'}Please answer the following questions.
ss = 'The cascade of Brinkman code';use word_vec to construct a vector, v, corresponding to the words appearing in ss. How many of the distinguished words appear in ss? Check that v is correct with the following commands.
words{find(v==1)}Be sure that the words printed are the ones appearing in ss.
C = U(:,1:25) * S(1:25,1:25) * V(:,1:25)' ; t = C * v;where v is the vector associated with the single word ``canard,'' will yield a selection of titles satisfying the criterion abs(t) > 0.2. Please include the list of titles in your summary. It is interesting that one of the titles containing the word ``canard'' itself does not appear in this list but that all the titles refer to behavior of excited synapses in neurons, from which one could conclude that the word ``canard'' refers to the behavior of excited synapses under certain circumstances.
Back to MATH2071 page.