0:52

Suppose you run a shipping service,

Â so, you know, users come and ask

Â you to help ship their package from

Â location A to location B and suppose

Â you run a website, where users

Â repeatedly come and they

Â tell you where they want

Â to send the package from, and

Â where they want to send it to

Â (so the origin and destination) and

Â your website offers to ship the package

Â for some asking price,

Â so I'll ship your package for $50,

Â I'll ship it for $20.

Â And based on the price

Â that you offer to the users,

Â the users sometimes chose to use a shipping service;

Â that's a positive example and

Â sometimes they go away and

Â they do not choose to

Â purchase your shipping service.

Â So let's say that we want

Â a learning algorithm to help us

Â to optimize what is the asking

Â price that we want to offer to our users.

Â And specifically, let's say we

Â come up with some sort of features

Â that capture properties of the users.

Â If we know anything about the demographics,

Â they capture, you know, the origin and

Â destination of the package, where they want to ship the package.

Â And what is the price

Â that we offer to them for shipping the package.

Â and what we want to do

Â is learn what is the

Â probability that they will

Â elect to ship the

Â package, using our

Â shipping service given these features, and

Â again just as a reminder these

Â features X also captures the price that we're asking for.

Â And so if we could

Â estimate the chance that they'll

Â agree to use our service

Â for any given price, then we

Â can try to pick

Â a price so that they

Â have a pretty high probability of

Â choosing our website while simultaneously

Â hopefully offering us a

Â fair return, offering us

Â a fair profit for shipping their package.

Â So if we can learn this property

Â of y equals 1 given

Â any price and given the other

Â features we could really

Â use this to choose appropriate

Â prices as new users come to us.

Â So in order to model

Â the probability of y equals 1,

Â what we can do is use

Â logistic regression or neural

Â network or some other algorithm like that.

Â But let's start with logistic regression.

Â 2:57

Now if you have a

Â website that just runs continuously,

Â here's what an online learning algorithm would do.

Â I'm gonna write repeat forever.

Â This just means that our website

Â is going to, you know, keep on

Â staying up.

Â What happens on the website is

Â occasionally a user

Â will come and for

Â the user that comes we'll get

Â some x,y pair corresponding to

Â a customer or to a user on the website.

Â So the features x are, you

Â know, the origin and destination specified

Â by this user and the price

Â that we happened to offer to

Â them this time around, and

Â y is either one or

Â zero depending one whether or

Â not they chose to

Â use our shipping service.

Â Now once we get this {x,y}

Â pair, what an online

Â learning algorithm does is then

Â update the parameters theta

Â using just this example

Â x,y, and in particular

Â we would update my parameters theta

Â as Theta j get updated as Theta j

Â minus the learning rate alpha times

Â my usual gradient descent

Â rule for logistic regression.

Â So we do this for j

Â equals zero up to n,

Â and that's my close curly brace.

Â So, for other learning algorithms

Â instead of writing X-Y, right, I

Â was writing things like Xi,

Â Yi but

Â in this online learning setting

Â where actually discarding the notion

Â of there being a fixed training

Â set instead we have an algorithm.

Â Now what happens as we get

Â an example and then we

Â learn using that example like

Â so and then we throw that example away.

Â We discard that example and we

Â never use it again and

Â so that's why we just look at one example at a time.

Â We learn from that example.

Â We discard it.

Â Which is why, you know, we're

Â also doing away with this

Â notion of there being this

Â sort of fixed training set indexed by i.

Â And, if you really run

Â a major website where you

Â really have a continuous stream

Â of users coming, then this

Â sort of online learning algorithm

Â is actually a pretty reasonable algorithm.

Â Because of data is essentially

Â free if you have so

Â much data, that data

Â is essentially unlimited then there

Â is really may be no

Â need to look at a

Â training example more than once.

Â Of course if we had only

Â a small number of users then

Â rather than using an online learning

Â algorithm like this, you might

Â be better off saving away all

Â your data in a fixed training

Â set and then running some algorithm over that training set.

Â But if you really have a continuous

Â stream of data, then an

Â online learning algorithm can be very effective.

Â I should mention also that one

Â interesting effect of this sort

Â of online learning algorithm is

Â that it can adapt to changing user preferences.

Â 5:51

And in particular, if over

Â time because of changes in

Â the economy maybe users

Â start to become more price

Â sensitive and willing to pay,

Â you know, less willing to pay high prices.

Â Or if they become less price sensitive and they're willing to pay higher prices.

Â Or if different things

Â become more important to users,

Â if you start to have new

Â types of users coming to your website.

Â This sort of online learning algorithm

Â can also adapt to changing

Â user preferences and kind

Â of keep track of what your

Â changing population of users

Â may be willing to pay for.

Â And it does that because if

Â your pool of users changes,

Â then these updates to your

Â parameters theta will just slowly adapt

Â your parameters to whatever your

Â latest pool of users looks like.

Â Here's another example of a

Â sort of application to which you might apply online learning.

Â this is an application in product

Â search in which we want to

Â apply learning algorithm to learn

Â to give good search listings to a user.

Â Let's say you run an online

Â store that sells phones - that

Â sells mobile phones or sells cell phones.

Â And you have a user interface

Â where a user can come to

Â your website and type in the

Â query like "Android phone 1080p camera".

Â So 1080p is a type

Â of a specification for a

Â video camera that you might

Â have on a phone, a cell phone, a mobile phone.

Â Suppose, suppose we have a hundred phones in our store.

Â And because of the way our

Â website is laid out, when

Â a user types in a query,

Â if it was a search query, we

Â would like to find a

Â choice of ten different phones to

Â show what to offer to the user.

Â What we'd like to do is have

Â a learning algorithm help us figure

Â out what are the ten phones

Â out of the 100 we

Â should return the user in response to

Â a user-search query like the one here.

Â Here's how we can go about the problem.

Â 7:37

For each phone and given

Â a specific user query; we

Â can construct a feature vector

Â X. So the feature

Â vector X might capture different properties of the phone.

Â It might capture things like,

Â how similar the user search query is in the phones.

Â We capture things like how many

Â words in the user search

Â query match the name of

Â the phone, how many words

Â in the user search query match the description of the phone and so on.

Â So the features x capture

Â properties of the phone and

Â it captures things about how

Â similar or how well

Â the phone matches the user query along different dimensions.

Â What we like to do is

Â estimate the probability that a

Â user will click on the

Â link for a specific phone,

Â because we want to show

Â the user phones that they

Â are likely to want to

Â buy, want to show the user

Â phones that they have high

Â probability of clicking on in the web browser.

Â So I'm going to define y equals

Â one if the user clicks on

Â the link for a phone and

Â y equals zero otherwise and

Â what I would like to do is

Â learn the probability the user

Â will click on a specific

Â phone given, you know,

Â the features x, which capture properties

Â of the phone and how well the query matches the phone.

Â To give this problem a name

Â in the language of

Â people that run websites like

Â this, the problem of learning this is

Â actually called the problem of

Â learning the predicted click-through rate, the predicted CTR.

Â It just means learning the probability

Â that the user will click on

Â the specific link that you

Â offer them, so CTR is

Â an abbreviation for click through rate.

Â And if you can estimate the

Â predicted click-through rate for any

Â particular phone, what we

Â can do is use this to

Â show the user the ten phones

Â that are most likely to click on,

Â because out of the hundred phones,

Â we can compute this for

Â each of the 100 phones and

Â just select the 10 phones

Â that the user is most likely to click on,

Â and this will be a pretty reasonable

Â way to decide what ten results to show to the user.

Â Just to be clear, suppose that

Â every time a user does

Â a search, we return ten results

Â what that will do is it

Â will actually give us ten

Â x,y pairs, this actually

Â gives us ten training examples every

Â time a user comes to

Â our website because, because for

Â the ten phone that we chose

Â to show the user, for each

Â of those 10 phones we get

Â a feature vector X, and

Â for each of those 10 phones we

Â show the user we will also

Â get a value for y, we

Â will also observe the value

Â of y, depending on whether

Â or not we clicked on that

Â url or not and

Â so, one way to run a

Â website like this would be to

Â continuously show the user,

Â you know, your ten best guesses for

Â what other phones they might like

Â and so, each time a user

Â comes you would get ten

Â examples, ten x,y pairs,

Â and then use an online

Â learning algorithm to update the

Â parameters using essentially 10

Â steps of gradient descent on these

Â 10 examples, and then

Â you can throw the data away, and

Â if you really have a continuous

Â stream of users coming to

Â your website, this would be

Â a pretty reasonable way to learn

Â parameters for your algorithm

Â so as to show the ten phones

Â to your users that may

Â be most promising and the most likely to click on.

Â So, this is a product search

Â problem or learning to rank

Â phones, learning to search for phones example.

Â So, I'll quickly mention a few others.

Â One is, if you have

Â a website and you're trying to

Â decide, you know, what special

Â offer to show the user,

Â this is very similar to phones,

Â or if you have a

Â website and you show different users different news articles.

Â So, if you're a news aggregator

Â website, then you can

Â again use a similar system to

Â select, to show to

Â the user, you know, what

Â are the news articles that they

Â are most likely to be interested

Â in and what are the news articles that they are most likely to click on.

Â Closely related to special offers, will we profit from recommendations.

Â And in fact, if you have

Â a collaborative filtering system, you

Â can even imagine a collaborative filtering

Â system giving you additional

Â features to feed into a

Â logistic regression classifier to try

Â to predict the click through

Â rate for different products that you might recommend to a user.

Â Of course, I should say that

Â any of these problems could also

Â have been formulated as a

Â standard machine learning problem, where you have a fixed training set.

Â Maybe, you can run your

Â website for a few days and

Â then save away a training set,

Â a fixed training set, and run

Â a learning algorithm on that.

Â But these are the actual

Â sorts of problems, where you do

Â see large companies get so

Â much data, that there's really

Â maybe no need to save away

Â a fixed training set, but instead

Â you can use an online learning algorithm to just learn continuously.

Â from the data that users are generating on your website.

Â 12:05

So, that was the online

Â learning setting and as we

Â saw, the algorithm that we apply to

Â it is really very similar

Â to this schotastic gradient descent

Â algorithm, only instead of

Â scanning through a fixed

Â training set, we're instead getting

Â one example from a user,

Â learning from that example, then

Â discarding it and moving on.

Â And if you have a continuous

Â stream of data for some application,

Â this sort of algorithm may be

Â well worth considering for your application.

Â And of course, one advantage of

Â online learning is also that

Â if you have a changing pool

Â of users, or if the

Â things you're trying to predict are

Â slowly changing like your user

Â taste is slowly changing, the online

Â learning algorithm can slowly

Â adapt your learned hypothesis to

Â whatever the latest sets of

Â user behaviors are like as well.

Â