The methods for solving matrix equations that we have seen already are called ``direct'' methods because they involve an algorithm that is guaranteed to terminate in a predictable number of steps. There are alternative methods, called ``iterative'' because they involve a repetitive algorithm that terminates when some specified tolerance is met, as with the eigenpair and singular value methods in earlier labs. In this lab, we will focus on the conjugate gradient method applied to matrices arising from elliptic partial differential equations. The conjugate gradient method can be regarded as a direct method because in exact arithmetic it terminates after steps for an matrrix. In computer arithmetic, however, the termination property is lost and is irrelevant in many cases because the iterates often converge in far fewer than steps.
Iterative methods are most often applied to matrices that are ``sparse'' in the sense that their entries are mostly zeros. If most of the entries are zero, the matrices can be stored more efficiently than the usual form by omitting the zeros, and it is possible to take advantage of this efficient storage using iterative methods but not so easily with direct methods. Further, since partial differential equations often give rise to large sparse matrices, iterative methods are often used to solve systems arising from PDEs.
You will see the conjugate gradient method (regarded as an iterative method), and will deploy it using matrices stored in their usual form and also in a compact form. You will see examples of the rapid convergence of the method and you will solve a matrix system (stored in compact form) that is so large that you could not even construct and store the matrix in its usual form in central memory, let alone invert it.
This lab will take three sessions.
In two dimensions, the Poisson equation can be written as
Suppose that the unit square is broken into smaller, nonoverlapping squares, each of side . Each of these small squares will be called ``elements'' and their corner points will be called ``nodes.'' The combination of all elements making up the unit square will be called a ``mesh.'' An example of a mesh is shown in Figure 1.
The nodes in Figure 1 have coordinates given by
for | ||
for | (2) |
You will be generating the matrix associated with the Poisson equation during this lab, using one of the simplest discretizations of this equation, based on the finite difference expression for a second derivative
The matrix equation (system of equations) can be generated from (4) by numbering the nodes in some way and forming a vector from all the nodal values of . Then, (4) represents one row of a matrix equation . It is immediate that there are at most five non-zero terms in each row of the matrix , no matter what the size of . The diagonal term is , the other four, if present, are each equal to , and all remaining terms are zero. We will be constructing such a matrix in Matlab during this lab.
The equation in the form (4) yields the ``stencil'' of the differential matrix, and is sometimes illustrated graphically as in Figure 2 below. Other stencils are possible and are associated with other difference schemes.
In order to construct matrices arising from the Poisson differential equation, it is necessary to give each node at which a solution is desired a number. We will be considering only the case of Dirichlet boundary conditions ( at all boundary points), so we are only interested in interior nodes illustrated in Figure 1. These nodes will be counted consecutively, starting with 1 on the lower left and increasing up and to the right to 100 at the upper right, as illustrated in Figure 3. This is just one possible way of numbering nodes, and many other ways are often used. Changing the numbering changes the rows and columns of the resulting matrix.
N=10; h=1/(N+1); n=0; % initialize x and y as column vectors x=zeros(N^2,1); y=zeros(N^2,1); for j=1:N for k=1:N % xNode and yNode should be between 0 and 1 xNode=??? yNode=??? n=n+1; x(n)=xNode; y(n)=yNode; end end plot(x,y,'o');
function tests(n) % function tests(n) prints index numbers of points near the % one numbered n % your name and the date. N=10; if mod(n,N)~=0 nAbove=n+1 else nAbove='none' endCopy this code to a file named tests.m and test it by applying it with n=75 to find the number of the point above the one numbered 75. Test it also for n=80 to see that there is no point above the one numbered 80.
You are now ready to construct a matrix derived from the Poisson equation with Dirichlet boundary conditions on a uniform mesh. In this lab we are considering only one of the infinitely many possibilities for such a matrix. The matrix defined above turns out to be negative definite, so the matrix we will use in this lab is its negative,
function A=poissonmatrix(N) % A=poissonmatrix(N) % more comments % your name and the date h=1/(N+1); A=zeros(N^2,N^2); for n=1:N^2 % center point value A(n,n) =4/h^2; % "above" point value if mod(n,N)~=0 nAbove=n+1; A(n,nAbove)=-1/h^2; end % "below" point value if ??? nBelow= ???; A(n,nBelow)=-1/h^2; end % "left" point value if ??? nLeft= ???; A(n,nLeft)=-1/h^2; end % "right" point value if ??? nRight= ???; A(n,nRight)=-1/h^2; end endIn the following parts of the exercise, take N=10 and A=poissonmatrix(N).
[rowvals]=max(nonSymm); [colval,m]=max(rowvals); [maxval,ell]=max(nonSymm(:,m));
are eigenfunctions of the Dirichlet problem for Poisson's equation on the unit square for each choice of and , with eigenvalues . It turns out that, for each value of k and ell, the vector E=sin(k*pi*x).*sin(ell*pi*y) is also an eigenvector of the matrix A, where x and y are the vectors from Exercise 1. The eigenvalue is L=2*(2-cos(k*pi*h)-cos(ell*pi*h))/h^2, where h=y(2)-y(1). Choose k=1 and ell=2. What is L and, for aid in grading your work, E(10)? (E(10) should be negative and less than 1 in absolute value). Show by computation that L*E=A*E, up to roundoff. If this is not true for your matrix A, go back and fix poissonmatrix.
You now have a first good matrix to use for the study of the conjugate gradient iteration in the following section. In the following exercise you will generate another.
A(n,n) = 8*n+2*N+2; A(n,nAbove)=-2*n-1; A(n,nBelow)=-2*n+1; A(n,nLeft) =-2*n+N; A(n,nRight)=-2*n-N;This matrix has the same structure as that from poissonmatrix but the values are different.
spy(poissonmatrix(12),'b*'); % blue spots hold onNext, see that anothermatrix has non-zeros everywhere poissonmatrix does by overlaying its plot on top of the previous.
spy(anothermatrix(12),'y*'); % yellow spots cover all blueFinally, confirm that anothermatrix has no new non-zeros by overlaying the plot of poissonmatrix.
spy(poissonmatrix(12),'r*'); % red spots cover all yellowPlease include this plot with your summary.
N=15; A=anothermatrix(N); x=((1:N^2).^2)'; sum(A*x)
One very important point to remember about the conjugate gradient method (and other Krylov methods) is that only the matrix-vector product is required. One can write and debug the method using ordinary matrix arithmetic and then apply it using other storage methods, such as the compact matrix storage used below.
A second very important point to remember is that the conjugate gradient algorithm requires the matrix to be positive definite. You will note that the algorithm below contains a step with in the denominator. This quantity is guaranteed to be nonzero only if is positive (or negative) definite. If is negative definite, multiply the equation through by (-1) as we have done with the Poisson equation.
The conjugate gradient algorithm can be described in the following way. Starting with an initial guess vector ,
In the following exercise you will first write a function to perform the conjugate gradient algorithm, You will test it without using hand calculations.
function x=cgm(A,b,x,m) % x=cgm(A,b,x,m) % more comments % your name and the datethat performs m iterations of the conjugate gradient algorithm for the problem A*y=b, with starting vector x. Do not create vectors for the variables , , and , instead use simple Matlab scalars alpha for , beta for , rhoKm1 for , and rhoKm2 for . Similarly, denote the vectors , , , and by the Matlab variables r, p, q, and x, respectively. For example, part of your loop might look like
rhoKm1=0; for k=1:m rhoKm2=rhoKm1; rhoKm1=dot(r,r); ??? r=r-alpha*q; end
The point of the last part of the previous exercise is that there is no need to iterate to the theoretical limit. Conjugate gradients serves quite well as an iterative method, especially for large systems. Regarding it as an interative method, however, requires some way of deciding when to stop before doing iterations on an matrix. Assuming that a reasonable stopping criterion is when the relative residual error is small. (If , then is trivial, and so is the solution.) Without an estimate of the condition number of , this criterion does not bound the solution error, but it will do for our purpose. A simple induction argument will show that , so it is not necessary to perform a matrix-vector multiplication to determine whether or not to stop the iteration. Conveniently, the quantity that is computed at the beginning of the loop is the square of the norm of , so the iteration can be terminated without any extra calculation.
In the following exercise, you will modify your cgm.m file by eliminating the parameter m in favor of a specified tolerance on the relative residual.
[x,k]=cg(A,b,x,tolerance)so that m is no longer input. Replace the value m in the for statement with size(A,1) because you know convergence should be reached at that point. Compute targetValue=tolerance^2*normBsquare. Replace the semilogy plot with the exit commands
if rhoKm1 < targetValue return endThere is no longer any need for relativeResidual or the plot.
A matrix A generated by either poissonmatrix or anothermatrix has only five non-zero entries in each row. When numbered as above, it has exactly five non-zero diagonals: the main diagonal, the super- and sub-diagonals, and the two diagonals that are N columns away from the main diagonal. All other entries are zero. It is prudent to store A in a much more compact form by storing only the diagonals and none of the other zeros. Further, A is symmetric, so it is only necessary to store three of the five diagonals. While not all matrices have such a nice and regular structure as these, it is very common to encounter matrices that are sparse (mostly zero entries). In the exercises below, you will be using a very simple compact storage, based on diagonals, but you will get a flavor of how more general compact storage methods can be used.
Remark: Since an matrix is rectangular, there will still be some extraneous zeros in this ``by diagonals'' storage strategy.
Remark: Since poissonmatrix contains only two different numbers, it can be stored in an even more efficient manner, but we will not be taking advantage of this strategy in this lab.
Since there are only three non-zero diagonals, you will see how to use a rectangular matrix to store a matrix that would be if ordinary storage were used. Further, since the matrices from poissonmatrix and anothermatrix are always , we will only consider the case that . The ``main diagonal'' is the one consisting of entries , the ``superdiagonal'' is the one consisting of entries and non-zero diagonal farthest from the main diagonal consists of entries , where . For the purpose of this lab, we will call the latter diagonal the ``far diagonal.''
Conveniently, Matlab provides a function to extract these diagonals from an ordinary square matrix and to construct an ordinary square matrix from diagonals. It is called diag and it works in the following way. Assume that is a square matrix with entries . Then
In the following exercise, you will rewrite poissonmatrix and anothermatrix to use storage by diagonals.
N=10; A=anothermatrix(N); Adiags=anotherdiags(N); size(Adiags,2)-3 % should be zero norm(Adiags(:,1)-diag(A)) % should be zero norm(Adiags((1:N^2-1),2)-diag(A,1)) % should be zero norm(Adiags((1:N^2-N),3)-diag(A,N)) % should be zeroIf not all four numbers are zero, fix your code. If you are debugging, it is easier to use N=3.
If you look at the conjugate gradient algorithm, you will see that the algorithm only made use of the matrix is to multiply the matrix by a vector. If we plan to store a matrix by diagonals, we need to write a special ``multiplication by diagonals'' function.
function y=multdiags(A,x) % y=multdiags(A,x) % more comments % your name and the date M=size(A,1); N=round(sqrt(M)); if M~=N^2 error('multdiags: matrix size is not a squared integer.') end if size(A,2) ~=3 error('multdiags: matrix does not have 3 columns.') end % the diagonal product y=A(:,1).*x; % the superdiagonal product for k=1:M-1 y(k)=y(k)+A(k,2)*x(k+1); end % the subdiagonal product ??? % the far diagonal product ??? % the far subdiagonal product ???
x=ones(9,1); B(k,:)*x
B(1,1)*x(1)+B(1,2)*x(2)+B(1,4)*x(4)Check that you have this correct.
B(2,1)*x(1)+B(2,2)*x(2)+B(2,3)*x(3)+B(2,5)*x(5)(This product has only the subdiagonal term in it.) Check that you have this correct and continue.
Remark: It is generally true that componentwise operations are faster than loops. The multdiags function can be made to execute faster by replacing the loops with componentwise operations. This sounds complicated, but there is an easy trick. Consider the superdiagonal code:
% the superdiagonal product for k=1:M-1 y(k)=y(k)+A(k,2)*x(k+1); endLook carefully at the following code. It is equivalent, very similar in appearance and will execute more quickly.
k=1:M-1; % k is a vector! y(k)=y(k)+A(k,2).*x(k+1);Then manually replacing the vector k and eliminating it generates the fastest code:
y(1:M-1)=y(1:M-1)+A(1:M-1,2).*x(2:M);You are not required to make these changes for this lab, but you might be interested in trying.
Now is the time to put conjugate gradients together with storage by diagonals. This will allow you to handle much larger matrices than using ordinary square storage.
N=500; xExact=rand(N^2,1); Adiags=poissondiags(N); b=multdiags(Adiags,xExact); tic;[y,n]=cgdiags(Adiags,b,zeros(N^2,1),1.e-6);tocHow many iterations did it take? How much time did it take?
In the previous exercise, you saw that the conjugate gradient works as well using storage by diagonals as using normal storage. You also saw that you could solve a matrix of size , stored by diagonals. This matrix is actually quite large, and you would not be able to even construct it, let alone solve it by a direct method, using ordinary storage.
A=gallery('toeppen',8,1,2,3,4,5)and you should be able to see how it is stored. From what you just printed what do you think A(2,1), A(3,2) and A(5,1) are?
B=[ 1 0 0 2 0 0 3 0 0 4 0 0 5 0 6 0];If B were stored using Matlab's sparse storage method, what would be the result of printing B (as you printed A).
The conjugate gradient method has many advantages, but it can be slow to converge, a disadvantage it shares with other Krylov-based iterative methods. Since Krylov methods are perhaps the most widely-used iterative methods, it is not surprising that there is a large effort to devise ways to make them converge faster. The largest class of such convergence-enhancing methods is called ``preconditioning'' and is sometimes explained as solving the system instead of , where is a suitably chosen, easily inverted matrix. More precisely, it solves .
The preconditioned conjugate gradient algorithm can be described in the following way. Starting with an initial guess vector ,
One of the earliest effective preconditioners disovered was the so-called ``incomplete Cholesky'' preconditioner. Recall that a positive definite symmetric matrix can always be factored as , where is an upper-triangular matrix with positive diagonal entries. Although there are several different preconditioners with the same name, the one that is easiest to describe and use here is simply to take
For large, sparse matrices it is generally impossible to construct the true Cholesky factor , and, if you can construct it, it is usually better to simply solve the system using it rather than some iterative method. On the other hand, there are ways of producing matrices that have the same sparsity pattern as and perform as well as . For our purposes in this lab, we will be using and seeing how it improves convergence of the conjugate gradient method.
function [x,k]=precg(U,A,b,x,tolerance) % [x,k]=precg(U,A,b,x,tolerance) % more comments % your name and the dateand where U is assumed to be an upper-triangular matrix, not necessarily the Cholesky factor of A, and M=U'*U. (Use Matlab's backslash operator twice, once for U' and once for U. Be sure to use parentheses so that Matlab does not misinterpret your formulæ.) Change the convergence criterion so that dot(r,r)<targetValue to be consistent with cg. Test it with U being the identity matrix, with A=poissonmatrix(10), the exact solution xExact=ones(100,1), b=A*xExact, and starting from x=zeros(100,1) with tolerance=1.e-10. You should get exactly the same values for x and k as produced by cg.
N=10; xExact=ones(N^2,1); % exact solution A=poissonmatrix(N); b=A*xExact; % right side vector tolerance=1.e-10; U=chol(A); % Cholesky factor of A x=zeros(N^2,1);Use precg to solve this problem. Check that the value of k is 2 at convergence, and that the solution is correct.
U=chol(A); U(find(A==0))=0;(Or, equivalently, U(A==0)=0.) Consider the system starting from for given by anothermatrix(50) and given as all ones and with a tolerance of 1.e-10. Find the exact solution as xExact=A\b. Compare the numbers of iterations and true errors between the conjugate gradient and ICCG methods. You should observe a significant reduction in the number of iterations without loss of accuracy.