To this point we’ve only looked at the two partial
derivatives and . Recall that these derivatives represent the
rate of change of f as we vary x (holding y fixed) and as we vary y
(holding x fixed) respectively. We now need to discuss how to find the rate
of change of f if we allow both x and y to change simultaneously.
The problem here is that there are many ways to allow both x and y to change. For instance
one could be changing faster than the other and then there is also the issue of
whether or not each is increasing or decreasing. So, before we get into finding the rate of
change we need to get a couple of preliminary ideas taken care of first. The main idea that we need to look at is just
how are we going to define the changing of x
and/or y.
Let’s start off by supposing that we wanted the rate of
change of f at a particular point,
say . Let’s also suppose that both x and y are increasing and that, in this case, x is increasing twice as fast as y is increasing. So, as y increases one unit of measure x will increase two units of
measure.
To help us see how we’re going to define this change let’s
suppose that a particle is sitting at and the particle will move in the direction
given by the changing x and y.
Therefore, the particle will move off in a direction of increasing x and y and the x coordinate of
the point will increase twice as fast as the y coordinate. Now that we’re
thinking of this changing x and y as a direction of movement we can get
a way of defining the change. We know
from Calculus II that vectors can be used to define a direction and so the
particle, at this point, can be said to be moving in the direction,



Since this vector can be used to define how a particle at a
point is changing we can also use it describe how x and/or y is changing at
a point. For our example we will say
that we want the rate of change of f
in the direction of . In this way we will know that x is increasing twice as fast as y is.
There is still a small problem with this however. There are many vectors that point in the same
direction. For instance all of the following
vectors point in the same direction as .
We need a way to consistently find the rate of change of a
function in a given direction. We will
do this by insisting that the vector that defines the direction of change be a
unit vector. Recall that a unit vector
is a vector with length, or magnitude, of 1.
This means that for the example that we started off thinking about we
would want to use
since this is the unit vector that points in the direction
of change.
For reference purposes recall that the magnitude or length
of the vector is given by,
For two dimensional vectors we drop the c from the formula.
Sometimes we will give the direction of changing x and y as an angle. For instance,
we may say that we want the rate of change of f in the direction of . The unit vector that points in this direction
is given by,
Okay, now that we know how to define the direction of
changing x and y its time to start talking about finding the rate of change of f in this direction. Let’s start off with the official definition.
Definition
So, the definition of the directional derivative is very
similar to the definition of partial derivatives.
However, in practice this can be a very difficult limit to compute so we
need an easier way of taking directional derivatives. It’s actually fairly simple to derive an equivalent
formula for taking directional derivatives.
To see how we can do this let’s define a new function of a
single variable,
where ,
,
a, and b are some fixed numbers.
Note that this really is a function of a single variable now since z is the only letter that is not
representing a fixed number.
Then by the definition of the derivative for functions of a
single variable we have,
and the derivative at is given by,
If we now substitute in for we get,
So, it looks like we have the following relationship.
Now, let’s look at this from another perspective. Let’s rewrite as follows,
We can now use the chain rule from the previous section to
compute,
So, from the chain rule we get the following relationship.
If we now take we will get that and (from how we defined x and y above) and plug
these into (2)
we get,
Now, simply equate (1) and
(3)
to get that,
If we now go back to allowing x and y to be any number
we get the following formula for computing directional derivatives.
This is much simpler than the limit definition. Also note that this definition assumed that
we were working with functions of two variables. There are similar formulas that can be
derived by the same type of argument for functions with more than two variables. For instance, the directional derivative of in the direction of the unit vector is given by,
Let’s work a couple of examples.
There is another form of the formula that we used to get the
directional derivative that is a little nicer and somewhat more compact. It is also a much more general formula that
will encompass both of the formulas above.
Let’s start with the second one and notice that we can write
it as follows,
In other words we can write the directional derivative as a
dot product and notice that the second vector is nothing more than the unit
vector that gives the direction of change. Also, if we had used the version for
functions of two variables the third component wouldn’t be there, but other
than that the formula would be the same.
Now let’s give a name and notation to the first vector in
the dot product since this vector will show up fairly regularly throughout this
course (and in other courses). The gradient of f or gradient vector of f is defined to be,
Or, if we want to use the standard basis vectors the
gradient is,
The definition is only shown for functions of two or three
variables, however there is a natural extension to functions of any number of
variables that we’d like.
With the definition of the gradient we can now say that the
directional derivative is given by,
where we will no longer show the variable and use this
formula for any number of variables.
Note as well that we will sometimes use the following notation,
where or as needed.
This notation will be used when we want to note the variables in some
way, but don’t really want to restrict ourselves to a particular number of
variables. In other words, will be used to represent as many variables as
we need in the formula and we will most often use this notation when we are
already using vectors or vector notation in the problem/formula.
Let’s work a couple of examples using this formula of the directional
derivative.
Before proceeding let’s note that the first order partial
derivatives that we were looking at in the majority of the section can be
thought of as special cases of the directional derivatives. For instance, can be thought of as the directional
derivative of f in the direction of or ,
depending on the number of variables that we’re working with. The same can be done for and
We will close out this section with a couple of nice facts
about the gradient vector. The first
tells us how to determine the maximum rate of change of a function at a point
and the direction that we need to move in order to achieve that maximum rate of
change.
Theorem
Proof
Let’s take a quick look at an example.
Example 3 Suppose
that the height of a hill above sea level is given by . If you are at the point in what direction is the elevation changing
fastest? What is the maximum rate of
change of the elevation at this point?
Solution
First, you will hopefully recall from the Quadric Surfaces section that this is an
elliptic paraboloid that opens downward.
So even though most hills aren’t this symmetrical it will at least be
vaguely hill shaped and so the question makes at least a little sense.
Now on to the problem. There are a couple of questions to answer
here, but using the theorem makes answering them very simple. We’ll first need the gradient vector.
The maximum rate of change of the elevation will then
occur in the direction of
The maximum rate of change of the elevation at this point
is,
Before leaving this example let’s note that we’re at the
point and the direction of greatest rate of change of
the elevation at this point is given by the vector . Since both of the components are negative
it looks like the direction of maximum rate of change points up the hill
towards the center rather than away from the hill.

The second fact about the gradient vector that we need to
give in this section will be very convenient in some later sections.
Fact
Proof
We’re going to
do the proof for the case.
The proof for the case is identical. We’ll also need some notation out of the
way to make life easier for us let’s let S
be the level surface given by and let . Note as well that P will be on S.
Now, let C be any curve on S that contains P. Let be the vector equation for C and suppose that be the value of t such that . In other words, be the value of t that gives P.
Because C lies on S we know that points on C
must satisfy the equation for S. Or,
Next, let’s use
the Chain Rule on this to get,
(4)
Notice that and so (4)
becomes,
At, this is,
This then tells
us that the gradient vector at P , ,
is orthogonal to the tangent vector, ,
to any curve C that passes through P and on the surface S and so must also be orthogonal to
the surface S.

As we will be seeing in later sections we are often going to
be needing vectors that are orthogonal to a surface or curve and using this
fact we will know that all we need to do is compute a gradient vector and we
will get the orthogonal vector that we need.
We will see the first application of this in the next chapter.