0:20

This is not intended to replace the written assignment description, but

Â to give you an overview and some tips to get you going.

Â So part 1 is Non-Personalized Recommendation, and

Â the assignment is about computing in a spreadsheet.

Â You can use pretty much any spreadsheet Google Sheets, Microsoft Excel,

Â OpenOffice Calc.

Â Just about any of them will have all of the functions that you would need

Â to complete this work.

Â And you're given a 20 x 20 ratings matrix as a .CSV file to download as a start.

Â Let's just take a quick look at that matrix.

Â 1:00

And I opened it up in Microsoft Excel, and

Â what you can see here is that it is 21 rows.

Â The first one is a header, followed by a set of user numbers.

Â In case, you're interested these are actually users who took an earlier version

Â of this course and submitted their ratings of a number of movies.

Â Across the top, you see a set of individual movies,

Â so number 260 is Star Wars, Episode IV, A New Hope.

Â We come in here, number 356 is Forrest Gump, and

Â these are the movies that they rated.

Â We go all the way out.

Â You'll see that again we have a total of 21 columns.

Â There's the heading column that has the user, and then 20 more.

Â Inside the spreadsheet, any cell with a number in it is a five star rating,

Â from 1, intended to be low, I didn't like this movie,

Â to 5, means I liked it very much.

Â We can see going across user number 755,

Â that user did not very much like the original Star Wars,

Â but liked Return of the Jedi, was not a big fan of

Â Forrest Gump but liked Silence of the Lambs.

Â So these are all real people's ratings.

Â The other thing you'll notice is that many of the cells are empty.

Â A blank cell means the person did not rate that, and presumably may not have seen it.

Â 2:40

And you're going to compute a variety of outputs submitting to Coursera the top

Â five movies for each of the things we ask you to calculate.

Â Where top five is measured numerically from your scores.

Â So if you compute something that turns out to be in a range from one to five,

Â then go back look at either sort those numbers or look for

Â the top five and compute them in order.

Â You going to submit both the list of top 5 and the scores that go with them,

Â and these will be graded for you automatically.

Â You'll get back your results.

Â 3:18

So what are these specific computations?

Â We're going to have you look at the mean rating the average of all of the ratings,

Â the number of ratings this could be a measure of popularity of the movie.

Â How many of these people's saw, whether they said they liked it or not.

Â The percentage of ratings that are positive, so

Â positive is defined as greater than or equal to four for this purpose.

Â So if something had 12 ratings, and eight of them were four and

Â five, and four of them were one, two, and three, then that percentage would

Â be two thirds or .6666, or 66%.

Â And then you're going to do some product associations.

Â People who rated for a particular movie.

Â We'll tell you that movie in the assignment,

Â and here's a secret about doing online courses.

Â When we record videos, there's all sorts of things we don't tell you, so

Â that we can change them.

Â And so, over time we've changed which movie we've asked people to calculate

Â to give people a little bit of a fresh results.

Â So read the assignment for the details.

Â You're going to do this two ways.

Â One is using the simple product association

Â formula from the lesson on product association.

Â So the count of selected movie and

Â each other movie divided by the count of selected movies.

Â In other words, how often do these co-occur?

Â And then we're going to use the lift formula, where you take the probability or

Â count works just as fine here, of the two movies together,

Â divided by the product of the probabilities of the movies apart.

Â 5:04

Both of those are going to be done separately,

Â that'll give you a chance to see the difference.

Â And we have one last computation, which has a nice built in function for it.

Â Correlation, this is a piercing correlation built into the spreadsheets as

Â c-o-r-r-e-l, and you're going to look at the correlation between a selected

Â movie in each of the other movies and find the five that correlate most together.

Â 5:30

To take you through a little bit in case you're not experienced with

Â doing these in a spreadsheet, I want to take you through a few tips and

Â show them to you in the spreadsheet as we go.

Â And probably the most important thing is to understand how you do formulas and

Â calculations.

Â And all of those in our standard spreadsheets start with an equal.

Â 5:50

So if I came, and I said I really want to know

Â what the average is of the ratings for

Â this movie I can say give me the average of.

Â And the beauty is spreadsheets I don't have to actually type the cell numbers.

Â I can just select this range, close the parenthesis,

Â and it will show me this is a 3.2667, nice to know.

Â There's a bunch of other formulas I can do here if I wanted

Â to know correlation between two, I can do this with correlation and

Â put two ranges in with a comma in between.

Â I could even do some interesting things.

Â So let's say I created the average here too, and

Â 7:32

And I'm going to have you look that up, as you go in your spreadsheet, but

Â the functionality of it is to say I have a condition.

Â I only want to count the cases that meet that condition.

Â Count if It's greater than or equal to four.

Â Count if the user is something or another.

Â 8:04

So let come back to the Spreadsheet here and say you know, here's my formula for

Â the average, and what would happen if I just copy.

Â I'm using ctrl c, but I could also say copy

Â as an added operation, and I copy into here.

Â You'll notice the number changed.

Â And in fact, the formula changed.

Â 8:29

Here I was, I had the average from C2 to C21.

Â Now it's the average of D2 to D21.

Â Whenever you copy a formula by default, in fact, if I do this, and

Â I copy this formula over here, this is now going to be this minus this.

Â How much better do people like the Return of the Jedi than Forest Gump?

Â That may not be what I intended.

Â 8:58

And when I don't want these things to move, and I should point out,

Â this works if I step downwards also.

Â If I copy this formula down here, the average is going to change.

Â And the reason that the average changes

Â is because the cell here at the top suddenly got excluded.

Â Now it's the average from D3 to D22.

Â 10:03

This turns out to be really useful as you're going through and

Â trying to create formulas.

Â So, if what I meant was not, I really care about the difference between

Â these adjacently, let me just put this formula that we have, all the way across.

Â 10:40

But let's say what I really want to know is If I treat Star Wars

Â as the greatest movie ever, some people feel that way, but

Â I want to see how Star Wars compared to every other movie in the world,

Â then what I really care about is, I'm going to lock this,

Â that I always care about comparing B.

Â 11:10

But, the C I'm willing to have changed as I go to whatever cell moves over.

Â Now, I actually think it's probably a mistake to write my formula that way.

Â I think the way to think about this is in every column,

Â I'm going to say how much better was Start Wars than that movie?

Â And in fact Star Wars is not better than Star Wars at all.

Â But if I copy this over here,

Â Star Wars is a quarter of a star better than the Return of the Jedi.

Â 11:46

That must be experimental error,

Â because I told you Star Wars was the greatest movie ever, right?

Â But as we go through we can see,

Â there are some movies that people liked more than Star Wars.

Â Well, exactly one.

Â 12:05

Maybe when you're done with this assignment you can treat yourself to

Â a video and go watch it.

Â But these are the kinds of things we can do with a spreadsheet.

Â So, the tips, remember put formulas in cells using equal.

Â You can copy formulas, and you will find this really useful as you go forward.

Â Rows and columns locked separately so

Â if you put D21, if you copy it down

Â into the right will become E22.

Â If you want to lock the 21 even if you move it up and down, put it as D$21.

Â If you want to lock the D even as you move left and

Â right do $D21, and if you want to lock both of them and

Â this is the thing you're always referring to put in both $.

Â Okay part two demographics.

Â This is going to be a small part of the assignment, because, as we've discussed,

Â many of the techniques that you would need to use you will not have learned yet.

Â But we do want you to start exploring the idea of demographics, and so

Â we're going to show you a second spreadsheet.

Â I'm not going to bring it up now,

Â because all it has is one additional column that has gender male and

Â female associated with each of the people that are providing ratings.

Â Obviously, in the real world, you'd have more data, you'd have more unknowns.

Â And we're going to have you compute, as given in the assignment,

Â several of the same outputs separately for male and female users, and

Â the final question in the assignment is asking you to assess whether it would

Â appear to be valuable to instead of giving overall averages

Â to give gender base stereotyped recommendations.

Â We're going to do that in number of different ways.

Â You're going to look at to what extent the genders have different average ratings.

Â To what extent they differ systematically, and what they think about the movies.

Â And will do just a little bit of computation that gives you a head start

Â thinking about the evaluation that's coming up at a later course

Â as we look at a couple of examples of how we might start to estimate

Â whether one of these is better than the other.

Â