0:00

[SOUND] So the first thing we need to do in this course on Linear Algebra is to

get a handle on vectors, which will turn out to be really useful for

us in solving those linear algebra problems.

That is the ones that are linear in their coefficients

such as most fitting parameters.

We're going to first step back and

look in some detail at the sort of things we're trying to do with data.

And why those vectors you first learned about in high school were even relevant?

And this will hopefully make all the work with vectors a lot more intuitive.

1:12

So we could fit it with some curve that

had two parameters, mu and sigma.

I would use an equation like this.

I'd call it f(x),

some function of x where x is the height,

= 1 over sigma root 2 pi times

the exponential of - (x - mu) squared

divided by 2 sigma squared.

So this equation only has two parameters, sigma here and mu, and it looks like this.

And it has an area here of 1,

because there's only 100% of people in the population as a whole.

Now don't worry too much about the form of the equation.

This is what happens, this is called the normal or Gaussian distribution.

And it's got a center of mu and a width of sigma.

And it's normalized, so that it's area is 1.

2:27

Imagine that we had guessed that the width was wider than it really is, but

keeping the area of 1.

So if we guessed that it was a fatter and probably a bit shorter distribution,

something like that, say.

So this one has something like that.

This one has a wider sigma, but it's got the same mu.

It'd be too high at the edges here, and too low in the middle.

So then we could add up the differences between all of our measurements, and

all of our estimates.

We've got all of these places where we underestimate here, and

all of these places where we overestimate here.

And we could add up those differences or, in fact,

the squares of them to get a measure of the goodness or badness of the fit.

And we'll look in detail at how we do that once we've done all the vectors

work, and once we've done actually all the calculus work, then we could plot how

that goodness varied as we change the fitting parameters, sigma and

mu, and we get a plot like this.

So if we had a correct value, our best possible value for mu here,

and our best possible value for the width, sigma here.

3:35

We could then plot, for a given value of mu and sigma, what the difference was.

So if we were at the right value, we'd get a value

of goodness where the sum of the squares of the differences was nought.

And if mu was too far over, if we had mis-estimated mu and

we got the distribution shifted over, so the width was right, but

we had some wrong value of mu there, that we get some value of all the sums of

the squares of the differences of goodness being some value here that was higher.

And it might be the same if we went over the other side and

we had some value there.

And if we were too wide, we'd get something there or too thin,

we'd get something that was too thin like that, something like that, say.

So we'd get some other value of goodness.

We could imagine plotting out all of the values of where we have the same

value of goodness or badness for different values of mu and sigma.

And we could then do that for some other value of badness, and

we might get a contour that looked like this, and

another contour that looked like this, and so on and so forth.

Now, say we don't want to compute the value of this goodness parameter for

every possible mu and sigma.

We just want to do it a few times, and

then find our way to the best possible fit of all.

4:55

Say we started off here with some guess that was too big a mu and

too small a width.

We thought people were taller than they really are,

and that they were tighter packed in their heights than they really are.

But what we could do is we could say well,

if I do a little move in mu and sigma, then does it get better or worse?

And if it gets better, well, we'll keep moving that direction.

So we could imagine making a vector of a change in mu and a change in sigma.

And we could have our original mu and sigma there.

And we could have a new value, mu prime, sigma prime, and

ask if that gives us a better answer?

If it's better there or if mu prime sigma prime took us over here?

If we were better or worse there, something like that.

5:44

Now actually, if we could find what the steepest way down the hill was,

then we could go down this set of contours, this sort of landscape here

towards the minimum point, towards the point where get the best possible fit.

And what we're doing here,

these are vectors, these are little moves around space.

They're not moves around a physical space,

they're moves around a parameter space, but it's the same thing.

So if we understand vectors and we understand how to get down hills,

that sort of curviness of this value of goodness, that's calculus.

Then once we got calculus and

vectors, we'll be able to solve this sort of problem.

So we can see that vectors don't have to be just geometric objects in the physical

order of space.

They can describe directions along any sorts of axes.

So we can think of vectors as just being lists.

If we thought of the space of all possible cars, for example.

So here's a car.

There's its back, there's its window, there's the front, something like that.

There's a car, there's the window.

We could write down in a vector all of the things about the car.

We could write down its cost in euros.

We could write down its emissions performance in grams of CO2 per

100 kilometers.

We could write down its Nox performance,

how much it polluted our city and killed people due to air pollution.

We could write down its Euro NCAP star rating, how good it was in a crash.

We could write down its top speed.

And write those all down in a list that was a vector.

That'd be more of a computer science view of vectors,

whereas the spatial view is more familiar from physics.

In my field, metallurgy, I could think of any alloy as being described by a vector

that describes all of the possible components,

all the compositions of that alloy.

Einstein, when he conceived relativity,

conceived of time as just being another dimension.

So space-time is a four dimensional space, three dimension of metres, and

one of time in seconds.

And he wrote those down as a vector of space-time of x, y, z,

and time which he called space-time.

When we put it like that, it's not so

crazy to think of the space of all the fitting parameters of a function, and

then of vectors as being things that take us around that space.

And what we're trying to do then is find the location in that space,

where the badness is minimized, the goodness is maximized, and

the function fits the data best.

If the badness surface here was like a contour map of a landscape, we're trying

to find the bottom of the hill, the lowest possible point in the landscape.

So to do this well, we'll want to understand how to work with vectors and

then how to do calculus on those vectors in order to find

gradients in these contour maps and minima and all those sorts of things.

Then we'll be able to go and do optimizations, enabling us to go and

work with data and do machine learning and data science.

8:39

What we've seen is the function we fit, whatever it is, has some parameters.

And we can plot how the quality of the fit, the goodness of the fit,

varies as we vary those parameters.

Moves around these fitting parameter space are then just vectors

in that space of fitting parameters.

And, therefore, we want to look at and revisit vector maths in order to be able

to build on that, and then do calculus, and then do machine learning.

[SOUND]