Paul's Online Math Notes
     
 
Online Notes / Linear Algebra / Vector Spaces / Least Squares
Linear Algebra

You can navigate through this E-Book using the menu to the left. For E-Books that have a Chapter/Section organization each option in the menu to the left indicates a chapter and will open a menu showing the sections in that chapter. Alternatively, you can navigate to the next/previous section or chapter by clicking the links in the boxes at the very top and bottom of the material.

Also, depending upon the E-Book, it will be possible to download the complete E-Book, the chapter containing the current section and/or the current section. You can do this be clicking on the E-Book, Chapter, and/or the Section link provided below.

For those pages with mathematics on them you can, in most cases, enlarge the mathematics portion by clicking on the equation. Click the enlarged version to hide it.

In this section we’re going to take a look at an important application of orthogonal projections to inconsistent systems of equations.  Recall that a system is called inconsistent if there are no solutions to the system.  The natural question should probably arise at this point of just why we would care about this.  Let’s take a look at the following examples that we can use to motivate the reason for looking into this.

 

Example 1  Find the equation of the line that runs through the four points , ,  and .

 

Solution

So, what we’re looking for are the values of m and b for which the line,  will run through the four points given above.  If we plug these points into the line we arrive at the following system of equations.

                                                              

 

The corresponding matrix form of this system is,

                                                       

Solving this system (either the matrix form or the equations) gives us the solution,

                                                        

 

So, the line  will run through the three points given above.  Note that this makes this a consistent system.

 

Example 2  Find the equation of the line that runs through the four points , ,  and .

 

Solution

So, this is essentially the same problem as in the previous example.  Here are the system of equations and matrix form of the system of equations that we need to solve for this problem.

                            

Now, try as we might we won’t find a solution to this system and so this system is inconsistent.

 

The previous two examples were asking for pretty much the same thing and in the first example we were able to answer the question while in the second we were not able to answer the question.  It is the second example that we want to look at a little closer.  Here is a graph of the points given in this example.

LeastSquares_G1

 

We can see that these points do almost fall on a line.  Without the reference line that we put into the sketch it would not be clear that these points did not fall onto a line and so asking the question that we did was not totally unreasonable.

 

Let’s further suppose that the four points that we have in this example came from some experiment and we know for some physical reason that the data should all lie on a straight line.  However, due to inaccuracies in the measuring equipment caused some (or all) of the numbers to be a little off.  In light of this the question in Example 2 is again not unreasonable and in fact we may still need to answer it in some way.

 

That is the point of this section.  Given this set of data can we find the equation of a line that will as closely as possible (whatever this means…) approximate each of the data points.  Or more generally, given an inconsistent system of equations, , can we find a vector, let’s call it , so that  will be as close to b as possible (again, what ever this means…).

 

To answer this question let’s step back a bit and take a look at the general situation.  So, we will suppose that we have an inconsistent system of n equations in m unknowns, ,  so the coefficient matrix, A, will have size .  Let’s rewrite the system a little and make the following definition.

 

 

 

We will call  the error vector and we’ll call  the error since it will measure the distance between Ax and b for any vector x in  (there are m unknowns and so x will be in  ).  Note that we’re going to be using the standard Euclidean inner product to compute the norm in these cases.  The least squares problem is then the following problem.

 

Least Square Problem Given an inconsistent system of equations, , we want to find a vector, , from  so that the error  is the smallest possible error.  The vector  is called the least squares solution.

 

Solving this problem is actually easier than it might look at first.  The first thing that we’ll want to do is look at a more general situation.  The following theorem will be useful in solving the least squares problem.

 

Theorem 1 Suppose that W is a finite dimensional subspace of an inner product space V and that u is any vector in V.  The best approximation to u from W is then .  By best approximation we mean that for every w (that is not  ) in W  we will have,

                                                        

 

Proof : For any vector w in W we can write.

 

 

 

Notice that  is a difference of vectors in W and hence must also be in W.  Likewise,  is in fact , the component of u orthogonal to W, and so is orthogonal to any vector in W.  Therefore  and  are orthogonal vectors.  So, by the Pythagorean Theorem we have,

 

 

Or, upon dropping the middle term,

 

 

 

Finally, if we have  then we know that  and so if we drop this term we get,

 

 

 

This is equivalent to,

 

 

and so we’re done.

Pf_Box

 

So, just what does this theorem do for us?  Well for any vector x in  we know that Ax will be a linear combination of the column vectors from A.  Now, let W be the subspace of  (yes,  since each column of A has n entries) that is spanned by the column vectors of A.  Then Ax will not only be in W (since it’s a linear combination of the column vectors) but as we let x range over all possible vectors in   Ax will range over all of W.

 

Now, the least squares problem is asking us to find the vector x in , we’re calling it , so that  is smaller than (i.e. smaller norm) than all other possible values of , i.e. .  If we plug in for the definition of the errors we arrive at.

 

 

 

With the least squares problem we are looking for the closest that we can get Ax to b.  However, this is exactly the type of situation that Theorem 1 is telling us how to solve.  The Ax range over all possible vectors in W and we want the one that is closed to some vector b in .  Theorem 1 tells us that the one that we’re after is,

 

 

 

Of course we are actually after  and not  but this does give us one way to find .  We could first compute  and then solve  for  and we’d have the solution that we’re after.  There is however a better way of doing this.

 

Before we give that theorem however, we’ll need a quick fact.

 

Theorem 2 Suppose that A is an  matrix with linearly independent columns.  Then,  is an invertible matrix.

 

Proof : From Theorem 8 in the Fundamental Subspaces section we know that if  has only the trivial solution then  will be an invertible matrix.  So, let’s suppose that .  This tells us that Ax is in the null space of , but we also know that Ax is in the column space of ATheorem 7 from the section on Inner Product Spaces tells us that these two spaces are orthogonal complements and Theorem 6 from the same section tells us that the only vector in common to both must be the zero vector and so we know that .  If , , … ,  are the columns of A then we know that Ax can be written as,

 

 

 

Then using  we also know that,

 

 

 

However, since the columns of A are linearly independent this equations can only have the trivial solution, .

 

Therefore  has only the trivial solution and so  is an invertible matrix.

Pf_Box

 

The following theorem will now give us a better method for finding the least squares solution to a system of equations.

 

Theorem 3 Given the system of equations , a least squares solution to the system denoted by , will also be a solution to the associated normal system,

                                                               

 

Further if A has linearly independent columns then there is a unique least squares solution given by,

                                                            

 

Proof :

Let’s suppose that  is a least squares solution and so,

 

 

 

Now, let’s consider,

 

 

However as pointed out in the proof of Theorem 1 we know that  is in the orthogonal complement of W.  Next, W is the column space of A and by Theorem 7 from the section on Inner Product Spaces we know that the orthogonal complement of the column space of A is in fact the null space of  and so,  must be in the null space of .

 

So, we must then have,

 

 

Or, with a little rewriting we arrive at,

 

 

and so we see that  must also be a solution to the normal system of equations.

 

For the second part we don’t have much to do.  If the columns of A are linearly independent then  is invertible by Theorem 2 above.  However, by Theorem 8 in the Fundamental Subspaces section this means that  has a unique solution.

 

To find the unique solution we just need to multiply both sides by the inverse of .

Pf_Box

 

So, to find a least squares solution to  all we need to do is solve the normal system of equations,

 

 

and we will have a least squares solution.

 

Now we should work a couple of examples.  We’ll start with Example 2 from above.

 

Example 3  Use least squares to find the equation of the line that will best approximate the points , ,  and .

 

Solution

The system of equations that we need to solve from Example 2 is,

                                                       

 

So, we have,

                   

 

The normal system that we need to solve is then,

                      

 

This is a fairly simple system to solve and upon doing so we get,

                               

 

So, the line that best approximates all the points above is given by,

                                                           

 

The sketch of the line and points after Example 2 above shows this line in relation to the points.

 

Example 4  Find the least squares solution to the following system of equations.

                                                    

Solution

Okay there really isn’t much to do here other than run through the formula.  Here are the various matrices that we’ll need here.

              

 

                                 

 

The normal system if then,

                                                

 

This system is a little messier to solve than the previous example, but upon solving we get,

                              

 

In vector form the least squares solution is then,

                                                                 

 

We need to address one more issues before we move on to the next section.  When we opened this discussion up we said that we were after a solution, denoted , so that  will be as close to b as possible in some way.  We then defined,

 

 

and stated that what we meant by as close to b as possible was that we wanted to find the  for which,

 

 

for all .

 

Okay, this is all fine in terms of mathematically defining what we mean by “as close as possible”, but in practical terms just what are we asking for here?  Let’s go back to Example 3.  For this example the general formula for  is,

 

 

 

So, the components of the error vector, , each measure just how close each possible choice of m and b will get us to the exact answer (which is given by the components of b).

 

We can also think about this in terms of the equation of the line.  We’ve been given a set of points  and we want to determine an m and a b so that when we plug , the x coordinate or our point, into  the error,

 

is as small as possible (in some way that we’re trying to figure out here) for all the points that we’ve been given.  Then if we plug in the points that we’ve been given we’ll see that this formula is nothing more than the components of the error vector.

 

Now, in the case of our example we were looking for,

 

 

so that,

 

 

is as small as possible, or in other words is smaller than all other possible choices of x.

 

We can now answer just what we mean by “as small as possible”.  First, let’s compute the following,

 

 

 

The least squares solution, , will be the value of x for which,

 

 

and hence the name “least squares”.  The solution we’re after is the value that will give the least value of the sum of the squares of the errors.

 

Example 5  Compute the error for the solution from Example 3.

 

Solution

First, the line that we found using least squares is,

                                                           

 

We can compute the errors for each of the points by plugging in the given x value into this line and then taking the difference of the result form the equation and the known y value.  Here are the error computations for each of the four points in Example 3.

                                            

 

On a side note, we could have just as easily computed these by doing the following matrix work.

                                         

 

The square of the error and the error is then,

       

 

Now, according to our discussion above this means that if we choose any other value of m and b and compute the error we will arrive at a value that is larger that 8.0125.


Online Notes / Linear Algebra / Vector Spaces / Least Squares

[Contact Me] [Links] [Privacy Statement] [Site Map] [Terms of Use] [Menus by Milonic]

© 2003 - 2009 Paul Dawkins